- What outcome are we optimizing for? β Booking conversion rate: searches that result in a completed booking. Secondary: revenue per search (RPS), which factors in both conversion rate and booking value. A 0.1% improvement in conversion on 600M searches/month = massive revenue. This shapes architecture: search must be FAST (slow search β user abandons β lost booking), prices must be ACCURATE at checkout (price change β user abandons β lost booking), and ranking must surface the MOST BOOKABLE results, not just the cheapest.
- Products? β Flights, hotels (700K+ properties), car rentals, vacation packages (flight + hotel bundle), activities/experiences, cruises. We'll focus on flights + hotels as the core.
- Where does inventory come from? β Hotels: direct connections (Marriott, Hilton APIs), hotel switches (Derbysoft, SiteMinder), channel managers. Flights: GDS (Amadeus, Sabre, Travelport), direct airline connections (NDC). Car: Hertz, Avis APIs directly.
- Revenue model? β Hotel: merchant model (Expedia buys wholesale, sells at markup) or agency model (takes commission). Flights: primarily booking fees + ancillary revenue. Packages: bundle discount funded by margin.
- Scale? β ~70M monthly visitors, ~600M searches/month, ~30M bookings/year across all brands (Expedia, Hotels.com, Vrbo, Orbitz).
- Latency? β Search results must appear in <3 seconds. But suppliers can take 5-15 seconds to respond. This is the fundamental tension.
| In Scope | Out of Scope |
|---|---|
| Hotel search & booking | Vrbo / vacation rental specifics |
| Flight search & booking | Cruise booking |
| Multi-supplier aggregation | Loyalty/rewards program |
| Pricing & availability caching | Content management (photos, reviews) |
| Package bundling (flight + hotel) | Advertising platform |
| Booking state machine & confirmation | Customer service / call center |
| Supplier integration layer | SEO / marketing infrastructure |
- UC1 (Hotel Search): User searches "Hotels in Paris, Mar 15-20, 2 adults" β system queries multiple suppliers in parallel β merges, deduplicates, ranks β shows 200+ hotels with prices in <3 seconds.
- UC2 (Flight Search): "NYC β London, Mar 15, round trip, 1 adult" β fan out to GDS + direct airline APIs β aggregate hundreds of itineraries β sort by price/duration/stops β show results in <3 seconds.
- UC3 (Hotel Booking): User selects a hotel, enters details, pays β system re-checks availability with supplier (price may have changed!) β if available, confirms booking β sends confirmation email + itinerary.
- UC4 (Package Booking): "Flight + Hotel to Paris" β book flight with airline + hotel with hotel supplier β BOTH must succeed β if hotel fails after flight succeeds, must handle partial failure gracefully.
- UC5 (Price Change at Checkout): User searched 20 minutes ago, price was $150/night. At checkout, re-query shows $165/night. Must display price change transparently and let user decide.
- Search speed <3 seconds: Despite querying 10+ external suppliers that may each take 5-15 seconds. This REQUIRES caching and progressive loading ("show what we have, keep loading more").
- High availability: Downtime during peak booking season (holidays, summer) is directly lost revenue. Target: 99.99%.
- Price accuracy at booking time: The price shown to the user MUST match what's charged. If it changes between search and booking, the user must be notified. This is a legal (consumer protection) requirement.
- Eventual consistency for search: Search results can show slightly stale pricing/availability. It's OK if a hotel shows "available" but is actually sold out β the error is caught at booking time. This tradeoff enables caching.
- Multi-currency, multi-language: 70+ countries, 35+ languages, 40+ currencies. Prices must be converted and displayed correctly.
| Requirement | Decision | Why (and what was rejected) | Consistency |
|---|---|---|---|
| Suppliers take 5-15s, user expects <3s | Aggressive caching + progressive loading | Cache popular searches in Redis. Show cached results immediately, live rates as they arrive. Waiting for all suppliers = 15s page load. | AP for search |
| Booking must be ACID (charge + reserve atomically) | PostgreSQL for bookings + saga pattern | Payment charge and supplier reservation must be coordinated. Failure at any step triggers compensating transactions (refund, release). | CP |
| Rates change every few minutes | Cassandra for rate cache (TTL-based expiration) | High write volume from supplier feeds. TTL auto-expires stale rates. Cassandra handles write-heavy workloads better than PostgreSQL. | AP (stale rates acceptable for search) |
| Multiple suppliers return different formats | Adapter-per-supplier (not generic parser) | Marriott XML β Booking.com JSON β Amadeus EDIFACT. Each adapter normalizes to internal schema. Generic parser would be brittle. | β |
| Supplier confirmation can fail after payment | Saga with compensating transactions | If supplier rejects after payment: auto-refund. If payment fails after reservation: release hold. Event-driven with Kafka for audit trail. | β |
| PCI compliance for payments | Tokenized payment vault (booking service never sees card numbers) | Card tokenized at edge. Only the vault + PSP see raw numbers. Reduces PCI scope from entire system to one component. | β |
π Search Orchestrator SEARCH
- Receives user search request
- Decides: cache hit? Or fan-out to suppliers?
- Merges, deduplicates, ranks results
- Returns progressive results (<3 sec)
π° Pricing & Availability Cache CACHE
- Pre-fetched hotel rates for popular searches
- TTL-based staleness (15 min - 2 hours)
- Billions of rate entries (hotel Γ room Γ date)
- The bridge between fast search and slow suppliers
π¨ Hotel Service VERTICAL
- Hotel content (descriptions, photos, amenities)
- Rate plans, room types, cancellation policies
- Geo-spatial search (hotels near X)
- Review aggregation
βοΈ Flight Service VERTICAL
- Itinerary construction from segments
- Fare rules, baggage policies
- GDS integration (Amadeus, Sabre)
- Direct airline (NDC) connections
π¦ Booking Engine WRITE PATH
- Re-verifies price & availability with supplier
- Handles price changes at checkout
- Coordinates multi-supplier bookings
- State machine: pending β confirmed β ticketed
π Supplier Integration Layer ADAPTERS
- Adapters for 100+ supplier APIs
- Normalizes heterogeneous formats (XML, JSON, EDIFACT)
- Connection pooling, rate limiting, circuit breakers
- Retry logic, timeout handling per supplier
π³ Payment Service FINANCE
- PCI-compliant card processing
- Merchant model: Expedia charges customer
- Agency model: supplier charges customer
- Multi-currency, FX conversion
π Ranking & Personalization ML
- Sort results by relevance, not just price
- Personalized based on user history
- Sponsored listings (hotel bidding for placement)
- Conversion-optimized ranking
The Scatter-Gather Pattern
Hotel Search vs. Flight Search β Why They're Different
| Dimension | Hotel Search | Flight Search |
|---|---|---|
| Inventory | Static properties with dynamic rates per night | Dynamic itineraries constructed from flight segments |
| Cacheability | HIGH β rates change slowly (hours). Same hotel, same dates = same rate. | LOW β fares change constantly, seat-level availability is volatile. |
| Search space | Bounded: hotels in [city] for [dates]. Typically 50-5,000 properties. | Combinatorial: segments Γ connections Γ fare classes. Thousands of itineraries. |
| Suppliers | Many small suppliers + hotel switches | 3 major GDS (Amadeus, Sabre, Travelport) + airline NDC |
| Dedup | Same hotel from multiple suppliers β property mapping | Same itinerary from GDS + airline direct β itinerary fingerprint |
| Pricing | Rate per night Γ number of nights. Relatively simple. | Complex fare rules: advance purchase, min stay, Saturday night, fare class availability. |
Cache Technology Choice
- Redis Cluster for hot data (next 30 days, popular destinations). In-memory, <1ms reads. ~5-10 TB.
- Distributed cache (Memcached / custom) for warm data (30-180 days out). Larger capacity, slightly slower.
- Database (Cassandra) for cold data (180+ days out, long-tail). Query on cache miss, backfill to Redis on access.
- Cache invalidation: TTL-based (automatic expiry) + event-based (supplier pushes rate change β invalidate specific key). No manual purging needed β stale data simply expires.
- Property mapping: The most under-appreciated problem. Supplier A calls it "Marriott Paris Opera." Supplier B calls it "Paris Marriott Hotel." Same hotel, different names, different IDs. A PROPERTY MAPPING TABLE (maintained by content team + ML matching) maps every supplier's hotel ID to Expedia's canonical property_id. Errors here β duplicate listings or missed inventory.
- Rate parity: Hotels often require that OTAs show the same rate as the hotel's own website ("rate parity clause"). The integration layer must track which rates are "parity" and which are "opaque" (discounted but hidden behind a bundle).
- Connectivity monitoring: Dashboard showing real-time health of each supplier connection: response time, error rate, availability. Alerts when a supplier degrades. Some suppliers have scheduled maintenance windows β the system must handle these gracefully.
| Data | Store | Why This Store |
|---|---|---|
| Search results cache | Redis + Cassandra | Redis for hot queries (same search repeated). Cassandra for warm cache (rates < 15 min old). Avoids re-querying slow suppliers. |
| Booking records | PostgreSQL | ACID for payment and reservation. Order ID, guest details, payment tokens, confirmation numbers. Sharded by booking_id. |
| Hotel/flight metadata | PostgreSQL + Memcached | Property descriptions, photos, amenities. Heavily cached β metadata changes infrequently. Memcached reduces DB load 10x. |
| Rate availability | Cassandra | Supplier rates by (hotel_id, date_range, room_type). High write volume from supplier feeds. TTL-based expiration for stale rates. |
| Booking events | Kafka | booking.initiated, booking.confirmed, booking.cancelled. Consumed by notifications, analytics, supplier reconciliation. |
| Supplier API responses | S3 | Raw XML/JSON responses archived for dispute resolution. Retained for 2 years per contract. |
- Default sort: ML-ranked by predicted conversion probability. Features: price competitiveness, hotel quality score, review rating, distance to city center, user's booking history, device type.
- Sponsored placements: Hotels bid for premium placement (top of results, "Recommended" badge). This is a significant revenue stream. Blended with organic results β marked as "Ad" per FTC guidelines.
- Personalization signals: Past bookings (prefers 4-star hotels), search history (always picks free cancellation), loyalty status, price sensitivity (clicks on sorted-by-price vs. recommended).
- A/B testing: Ranking models are constantly A/B tested. Metric: booking conversion rate (CVR) and revenue per search (RPS). A 0.1% CVR improvement on 600M searches/month = significant revenue.
- Merchant model: Expedia charges customer $200/night. Pays hotel $160/night. Margin: $40/night. Expedia is the merchant of record β handles chargebacks, refunds.
- Agency model: Customer pays hotel directly. Hotel pays Expedia 15-25% commission. Less financial risk for Expedia but less control over pricing.
- Reconciliation: For every booking, reconcile: (1) what we charged the customer, (2) what the supplier confirms, (3) what the supplier invoices us. Discrepancies β dispute with supplier. Run nightly batch matching booking records to supplier invoices.
- Refunds/cancellations: Each booking component has its own cancellation policy. "Free cancellation until 48h before check-in" β refund full amount. Non-refundable β no refund. Partial cancellation of a package β recalculate bundle pricing.
- Property content: 700K properties Γ (description, 50+ photos, amenities, location, star rating, policies). Stored in a content service separate from pricing. Updated infrequently (weeks/months). CDN-cached aggressively.
- Reviews: Millions of verified guest reviews. Aggregated across Expedia Group brands. Review scores factor into search ranking. Fraud detection on fake reviews (ML model).
- Photo quality: Professional photography program for top properties. AI-based photo scoring to surface the best images. Photos are the #1 factor in hotel conversion β more impactful than price for many travelers.
| Extension | Architecture Impact |
|---|---|
| AI Trip Planner | LLM-powered conversational search: "Plan a romantic week in Italy for under $3K." Requires multi-step reasoning across flights, hotels, activities. Generates itineraries by combining search results with knowledge of destinations. Very different UX from the traditional form-based search. |
| Dynamic Packaging Engine | Algorithmically construct the optimal flight + hotel + car bundle price. The bundle price must be cheaper than booking individually (otherwise why bundle?). Requires real-time optimization across supplier margins and inventory. The "opaque" rate model: show discounted rate only in bundle, hiding the per-component price. |
| Fintech (Buy Now Pay Later) | Let users split a $2,000 vacation into 4 monthly payments. Requires credit risk assessment, payment scheduling, collections for missed payments. Partnership with BNPL providers (Affirm, Klarna) or build in-house. Booking must be confirmed immediately but payment collected over time. |
| Real-Time Price Alerts | User saves a search β monitor prices continuously β notify when price drops. Requires a massive background price-monitoring pipeline: millions of saved searches Γ periodic re-evaluation. Streaming architecture (Kafka + Flink) to compare cached rates against user thresholds. |
| Supplier Yield Management | Help hotels dynamically price rooms based on demand, competitor pricing, and events. Build a B2B analytics platform that surfaces insights to hotel partners. Aligns Expedia's interests (more bookings) with hotels' interests (higher revenue per room). |
How do you return search results in 3 seconds when suppliers take 5-15 seconds?
Three techniques combined: (1) aggressive caching β for popular queries (Paris hotels, NYC flights), we have pre-fetched cached results that can be returned instantly while fresh supplier queries run in the background. (2) Progressive loading β we show cached results immediately with a "prices from $X" label, then replace with live rates as suppliers respond. The UI shows a loading indicator per supplier. (3) Timeout with partial results β we set a 3-second deadline. Whatever suppliers have responded by then, we show. Slow suppliers get their results dropped or shown as "check availability." The ranking algorithm actually helps here: the most popular hotels tend to be from the fastest suppliers (Booking.com, Expedia's own inventory), so the first results shown are usually the most relevant. The key UX insight: showing 80% of results in 2 seconds is better than showing 100% in 12 seconds.
What happens when a booking fails after the payment is charged?
This is a distributed saga with compensating transactions. The booking flow is: (1) reserve inventory with supplier (hold), (2) charge payment, (3) confirm with supplier. If step 3 fails (supplier rejects the confirmation β maybe the room was booked by someone else in the race condition window), we execute compensations in reverse: (3b) refund payment β (2b) release hold. The refund is automatic and immediate for card payments. For step 1 failures (supplier hold fails), we never reach payment β the user sees "this option is no longer available." The trickiest failure is the "phantom booking" β supplier confirmation succeeds but our system crashes before recording it. For this, we run a reconciliation job every hour: fetch booking status from each supplier and compare against our records. Mismatches create cases for operations. Suppliers also send daily reconciliation files (similar to bank settlement), which is the final catch-all.