- What outcome are we optimizing for? β Purchase conversion rate (session β order) and customer lifetime value. Secondary: delivery speed (same-day/next-day), selection breadth. This shapes architecture: the browse path must be FAST (every 100ms of latency costs ~1% sales), the buy path must be CORRECT (no overselling), and the delivery promise must be HONEST (showing "delivery by Tuesday" when it's actually Thursday destroys trust).
- Marketplace or first-party only? Do we support third-party sellers, or just our own inventory? β Both. Marketplace model with our own warehousing (FBA-like) is most interesting.
- Core flow? Browse β search β product detail β cart β checkout β order tracking? β Yes, this is the primary flow.
- Inventory model? Single warehouse or distributed? β Multi-warehouse. Inventory is distributed across regions.
- Payment? β Acknowledge integration with payment provider, don't deep-dive the payment gateway itself.
- Recommendations? β Mention as a component, not a deep dive.
- Scale? β ~300M active customers, ~500M products in catalog, ~5M orders/day normal, ~50M on peak days (Prime Day, Black Friday).
- Flash sales / peak events? β Yes, must handle 10Γ traffic spikes. This is a critical constraint.
| In Scope | Out of Scope |
|---|---|
| Product catalog & detail pages | Seller portal / seller onboarding |
| Product search | Recommendation engine internals |
| Shopping cart | Returns & refunds |
| Checkout & inventory reservation | Delivery logistics / last-mile |
| Order processing pipeline | Reviews & ratings |
| Inventory management (multi-warehouse) | Prime membership system |
| Order tracking | Advertising platform |
- UC1: User browses/searches products β views product detail page with price, availability, and delivery estimate
- UC2: User adds item to cart β cart persists across sessions
- UC3: User checks out β inventory reserved β payment processed β order created
- UC4: Order is fulfilled β user tracks status (placed β packed β shipped β delivered)
- Inventory accuracy is paramount β we must NEVER sell more than we have (overselling). This means strong consistency on the inventory decrement path.
- Product catalog reads are eventually consistent β a product page showing a price that's 30 seconds stale is fine. Availability > freshness for browsing.
- Checkout must be ACID β inventory reservation + order creation is a transaction that must not partially fail.
- 10Γ spike tolerance β must handle Black Friday / Prime Day without degradation. This means the system can't be designed for average load only.
- Cart is durable β users expect their cart to survive app restarts and multi-device access.
- Catalog is read-heavy β 100:1 read:write ratio on product pages. Most users browse, few buy.
- Global β multi-region serving for low latency. Inventory is regional (warehouse-specific).
| Requirement | Decision | Why (and what was rejected) | Consistency |
|---|---|---|---|
| Never oversell inventory (financial integrity) | PostgreSQL with compare-and-swap | UPDATE ... WHERE count > 0 is atomic. Eventual consistency would allow overselling during flash sales. | CP |
| Cart: spiky traffic, simple key-value | DynamoDB (not PostgreSQL) | Single-key lookups, TTL expiration, auto-scaling. No relational joins needed. PostgreSQL would need manual scaling for Black Friday spikes. | AP |
| Product search with facets + filters | Elasticsearch (not PostgreSQL full-text) | Faceted aggregations (brand, price range, ratings) are native in ES. PostgreSQL full-text search can't do facets efficiently. | AP |
| Order state requires ACID | PostgreSQL for orders (sharded by customer_id) | 95% of queries are "my orders" β single-shard. Orderβpaymentβshipment requires foreign keys + transactions. | CP |
| Fulfillment is async (doesn't block checkout) | Kafka / SQS for order events | Checkout completes in <3s. Warehouse processing takes hours. Queue decouples critical path from async pipeline. | β |
| Product images at CDN scale | S3 + CDN with content-addressable URLs | URL includes content hash β zero cache invalidation needed. New image = new URL. Old URLs expire naturally. | β |
π± Client Apps CLIENT
- Web, iOS, Android
- REST API for CRUD, CDN for static assets
π API Gateway + CDN EDGE
- AuthN, rate limiting, routing
- CDN caches product pages, images
π¦ Product Catalog Service READ PATH
- Product details, pricing, images
- Heavily cached (Redis + CDN)
- Seller updates via async pipeline
π Search Service READ PATH
- Full-text search + faceted filtering
- Elasticsearch cluster
- Autocomplete, spell correction
π Cart Service STATEFUL
- Add/remove/update cart items
- Persists across sessions & devices
- DynamoDB or Redis + DB backing
π Inventory Service CRITICAL
- Real-time stock counts per warehouse
- Reserve / release / decrement
- Strong consistency β the hardest piece
π³ Checkout / Order Service CRITICAL
- Orchestrates: inventory β payment β order creation
- ACID transaction across steps
- Idempotent (retry-safe)
π¬ Order Fulfillment Service ASYNC
- Picks warehouse, generates shipping
- State machine: placed β packed β shipped β delivered
- Integrates with shipping carriers
Inventory Model: Reserve β Confirm β Release
- Available stock = total_stock β reserved_stock β sold_stock
- Reserve: At checkout, atomically decrement available_count. Item is "held" for this order.
- Confirm: On successful payment, convert reservation to sale (decrement reserved, increment sold).
- Release: On failed payment, timeout, or cancellation, release the reservation (increment available back).
- TTL on reservations: If payment isn't confirmed within 10 minutes, reservation auto-expires. Background job sweeps expired reservations.
The Critical SQL: Atomic Reservation
Pessimistic Locking (SELECT FOR UPDATE)
- Lock the row β read β check β update β release lock
- β Simple to reason about
- β Locks held during payment processing β deadlocks under load
- β Hot items create lock contention (1000 threads waiting for same row)
Conditional UPDATE (our approach)
- Single atomic UPDATE with WHERE guard β no explicit lock held
- β Lock duration is microseconds (just the UPDATE)
- β No deadlocks β single statement
- β Hot items handled fine β Postgres serializes row-level UPDATEs
- β οΈ Under extreme contention (>1000 concurrent), consider Redis-based counter as a pre-filter
Hot Item Strategy (Lightning Deals / Last Unit)
- Problem: 10,000 users try to buy 100 units simultaneously β 10,000 DB transactions hitting the same row.
- Pre-filter with Redis: Maintain an approximate counter in Redis. Requests check Redis first β if counter says 0, reject immediately without hitting DB. Only let through ~2Γ the remaining stock.
- Queue-based checkout for flash sales: For specific sale events, route checkout requests through an SQS queue. Process sequentially at DB level. Users get "your order is being processed" response and poll for status.
- Warehouse spreading: Split hot item inventory across multiple warehouse rows. Reserve from any warehouse with stock. Reduces per-row contention.
Checkout Orchestration (Saga Pattern)
- Step 1: Reserve inventory β on failure: return error, stop.
- Step 2: Charge payment β on failure: RELEASE inventory reservation, return error.
- Step 3: Create order record β on failure: REFUND payment, RELEASE inventory, return error.
- Step 4: Publish
order.placedevent β on failure: order exists but fulfillment retries from event.
idempotency_key. Order Service checks if this key already has an order β returns existing order. This prevents double-charges on retries (network timeout, user double-clicks). Critical for payment safety.Order State Machine
Event-Driven Fulfillment
- Event flow:
order.placedβ Fulfillment Service consumes β assigns warehouse β creates shipments β publishesshipment.createdβ Warehouse System picks items βshipment.packedβ Carrier integration βshipment.shippedβ Carrier webhook βshipment.delivered. - Each state transition: Update order/shipment status in DB, publish event to Kafka, Notification Service sends email/push to customer.
- Warehouse selection: For each item, pick the warehouse that (a) has stock AND (b) is closest to the shipping address. Goal: minimize shipping cost and delivery time. If one warehouse has all items, prefer a single shipment. If not, split into multiple shipments.
Catalog Architecture
| Layer | Tech | What It Serves | TTL / Freshness |
|---|---|---|---|
| CDN | CloudFront | Product page HTML, images, static data | 5β15 min TTL |
| Application Cache | Redis Cluster | Product objects (structured data for API) | 30sβ5 min TTL |
| Primary Store | PostgreSQL (sharded) | Source of truth for product data | Strong |
| Search Index | Elasticsearch | Full-text search, faceted filtering, autocomplete | Near-real-time (seconds lag) |
| Media | S3 + CDN | Product images (multiple sizes) | Immutable (versioned URLs) |
Catalog Update Pipeline
- Seller updates product β write to Products DB β publish
product.updatedto Kafka - Consumers: (1) ES Indexer updates search index, (2) Cache Invalidator deletes Redis + CDN cache entries
- Price changes are applied immediately in DB but may take 30sβ5 min to propagate through all cache layers. The checkout flow always re-validates price from DB (not cache) to prevent stale-price purchases.
| Data | Store | Access Pattern | Consistency |
|---|---|---|---|
| Product Catalog | PostgreSQL (sharded) + Redis + CDN | 500K reads/sec peak (from cache). 50M seller updates/day. | Strong writes, eventual reads |
| Search Index | Elasticsearch (cluster) | 100K search queries/sec peak. Faceted filtering + ranking. | Near-real-time (seconds lag from source) |
| Inventory | PostgreSQL (sharded by product_id) | ~1.7K reservations/sec peak. Conditional UPDATEs. | Strong (ACID) β non-negotiable |
| Shopping Cart | DynamoDB (or Redis + DB) | ~20K updates/sec peak. Key-value by user_id. | Strong (durable across sessions) |
| Orders | PostgreSQL (sharded by customer_id) | ~580 writes/sec peak. "My orders" query. | Strong (ACID) |
| Shipments | PostgreSQL (same cluster as orders) | Updated by fulfillment pipeline. Query by order_id. | Strong |
| User Profiles | PostgreSQL + Redis cache | Read-heavy, addresses, payment methods. | Strong writes, cached reads |
| Product Images | S3 + CDN | ~50TB total. Immutable (versioned). Served from CDN edge. | Eventual (CDN propagation) |
| Events | Kafka | order.placed, product.updated, shipment.status_changed, etc. | Ordered per partition |
{id, title, description, price, images[], attributes, avg_rating, review_count, availability}Heavily cached (CDN + Redis). Availability is approximate.
{products[], facets: {categories[], brands[], price_ranges[]}, total_count, next_cursor}{items: [{product_id, quantity, unit_price, product_snapshot}], subtotal}{product_id, quantity}{quantity} β set to 0 to remove{shipping_address_id, payment_method_id, idempotency_key}Response:
{order_id, status: "placed", estimated_delivery, total_charged}Headers:
Idempotency-Key: {uuid} β prevents double charges on retry.{order, items[], shipments: [{id, status, tracking_number, carrier, estimated_delivery}]}| At Scale | What Breaks | Mitigation |
|---|---|---|
| 10Γ (50M orders/day) | Inventory DB row contention on hot products. Single Elasticsearch cluster at query limits. Order DB write throughput. | Redis pre-filter + queue-based checkout for hot items. ES cluster scale-out (add data nodes). Shard Order DB by customer region. |
| 100Γ (500M orders/day) | Saga orchestration becomes bottleneck β Order Service coordinating too many steps. Kafka partition throughput. Global inventory coordination. | Move to choreography saga at this scale. Regional Kafka clusters. Per-region inventory with cross-region rebalancing. CQRS: separate order write model from read model. |
| Data | Model | Rationale |
|---|---|---|
| Inventory reservation | Strong (ACID) | Overselling is unacceptable. Row-level locks on conditional UPDATE. |
| Order creation | Strong (ACID) | Can't lose or duplicate orders. Financial record. |
| Product page display | Eventual (5 min) | Slightly stale price/description is invisible to users. |
| Availability on product page | Eventual (30s) | Approximate. Real check at checkout. "In Stock" might be wrong briefly. |
| Search results | Eventual (seconds) | ES index lags behind source of truth. Acceptable for discovery. |
| Shopping cart | Strong (durable) | Users expect cart to persist. DynamoDB gives strong consistency on read-after-write. |
| Order status updates | Eventual (seconds) | Event-driven. Slight lag between warehouse scan and user seeing "shipped." |
- Checkout pipeline: End-to-end latency from "click Buy" to order confirmation. Success rate. Failure breakdown (inventory, payment, system error). Inventory reservation hit rate (% that succeed on first attempt).
- Business metrics: Conversion rate (browse-to-buy), cart abandonment rate, average order value, orders/second real-time.
- Infrastructure: Inventory DB transaction latency (p99), Redis cache hit ratio (target: >95%), ES query latency, Kafka consumer lag per topic.
- Alerting: Checkout error rate >1%, inventory DB p99 >50ms, reservation timeout rate spikes, payment failure rate >5%, Kafka lag >1000 messages.
- Price integrity: Prices are NEVER trusted from the client. Checkout always re-fetches price from DB. Prevents price manipulation attacks.
- Inventory attacks: Rate limit cart additions (prevent bots reserving all stock). Reservation TTL ensures held stock is released.
- Payment security: PCI compliance β payment details never touch our servers. Tokenized via payment provider (Stripe, Adyen). Only token stored.
- Bot protection: CAPTCHA on checkout during flash sales. Device fingerprinting. Purchase quantity limits per customer per product.
- Data encryption: All PII encrypted at rest. TLS in transit. Shipping addresses and payment tokens in separate, access-controlled stores.
| Extension | Why It Matters | Architecture Impact |
|---|---|---|
| Recommendation Engine | Drives ~35% of Amazon's revenue | Collaborative filtering + content-based. Precomputed per-user recommendations stored in a feature store. Served at product page and cart. Adds read load but no write-path changes. |
| Reviews & Ratings | Trust signal, conversion driver | Separate service. Write-light, read-heavy (denormalized avg_rating on product). Moderation pipeline (similar to content moderation). |
| Returns & Refunds | Post-purchase experience | Reverse of order pipeline: return requested β item received β refund processed β inventory restocked. Separate state machine. |
| Seller Analytics | Marketplace health | CQRS: read model built from order + inventory events via CDC to a data warehouse. Separate from operational path. |
| Multi-Region Active-Active | Global latency, disaster recovery | Regional inventory DBs (each region owns its warehouse inventory). Catalog replicated read-only. Orders owned by the region closest to customer. Cross-region conflict resolution for shared resources. |
| Dynamic Pricing | Competitive positioning, margin optimization | ML model trained on demand signals, competitor prices, inventory levels. Updates prices asynchronously. Same cache invalidation pipeline. |
| Subscribe & Save | Recurring revenue | Scheduler service that creates orders on cadence. Requires handling out-of-stock, price changes, and payment failures gracefully. |
How do you prevent overselling during a flash sale when 100K users click "Buy" simultaneously?
This is the defining challenge. The inventory decrement MUST be strongly consistent β we use a compare-and-swap operation: `UPDATE inventory SET count = count - 1 WHERE product_id = X AND count > 0`. If count hits 0, subsequent attempts fail atomically. But hitting a single row in PostgreSQL with 100K concurrent writes would be a disaster. So we use a reservation pattern: (1) when the user clicks "Add to Cart," we don't decrement inventory β we create a soft reservation in Redis with a 10-minute TTL, (2) Redis DECR is atomic and handles 100K+/sec, (3) the final hard decrement in PostgreSQL only happens at checkout, where traffic is naturally lower (only ~10-20% of carts convert). If the Redis reservation expires (user abandoned cart), the count is restored. This means during a flash sale, Redis absorbs the thundering herd, and PostgreSQL sees a manageable write rate. The tradeoff: a user might see "In Stock" but fail at checkout if all reservations are taken β which is better than overselling.
Why DynamoDB for the cart instead of PostgreSQL?
Carts are session-scoped, user-specific, and have a simple access pattern: get cart by user_id, add/remove items, expire after 30 days. DynamoDB excels here because: (1) single-key lookups at <5ms regardless of scale, (2) auto-scaling β cart traffic is extremely spiky (Black Friday), and DynamoDB handles this without provisioning, (3) TTL-based expiration built in β abandoned carts auto-delete, (4) no schema β cart items can have variable attributes (gift wrap, custom engraving). PostgreSQL could work, but we'd need to manage connection pools for spiky traffic, handle TTL cleanup ourselves, and the relational model doesn't buy us anything since carts have no joins. The cart is a document, not a relational entity. However, at checkout, we DO copy the final cart into PostgreSQL as an order β because orders have relational integrity (linked to payments, shipments, invoices).
Walk me through what happens when a payment fails after inventory is decremented.
This is a saga pattern. The checkout flow is: (1) decrement inventory, (2) charge payment, (3) create order. If payment fails after step 1, we need a compensating transaction: re-increment inventory. The Order Service orchestrates this β each step publishes an event to Kafka. If step 2 fails, the Order Service publishes an `inventory.release` event, and the Inventory Service adds the count back. The order moves to PAYMENT_FAILED state. Key design decisions: (1) we decrement inventory BEFORE charging payment β not after β because a payment that succeeds but has no inventory is worse (we'd have to refund), (2) the saga has a timeout β if the payment gateway doesn't respond in 30 seconds, we release inventory and show an error, (3) each step is idempotent β retrying an inventory decrement with the same order_id is a no-op if it already succeeded. The Kafka event log gives us an audit trail and the ability to replay failed sagas.
How would you design the search service to handle "running shoes under $50" with filtering?
This is an Elasticsearch problem, not a PostgreSQL problem. Product catalog data is dual-written: PostgreSQL is the source of truth (for orders, inventory), and Elasticsearch is the search index (for discovery). The search query "running shoes under $50" becomes an ES query: `bool: must: [match: "running shoes" on title+description], filter: [range: price < 50, term: category = "shoes"]`. Filters use ES filter context (cached, no scoring), while text matching uses query context (scored by relevance). We add faceted aggregations so the UI can show "Brand: Nike (234), Adidas (189)..." in the sidebar. The index is updated asynchronously β a product price change in PostgreSQL publishes a Kafka event, which the search indexer consumes and updates ES. This means search results can be ~5 seconds stale, which is acceptable. For price-sensitive queries, the product detail page always reads from PostgreSQL (source of truth), so the user sees the accurate price before buying.
What's your CDN invalidation strategy when a product image changes?
We don't invalidate β we use content-addressable URLs. Every product image URL includes a hash of the content: `cdn.example.com/images/{hash}.jpg`. When a seller uploads a new image, it gets a new hash, and the product record is updated to point to the new URL. The old URL remains valid in CDN cache until it naturally expires (TTL 30 days), but no one references it anymore. This means: (1) zero cache invalidation needed, (2) browsers cache images aggressively (the URL literally changes when content changes), (3) rollback is trivial (revert the product record to the old hash). The one exception is the product listing page HTML, which references the image URL β this has a short CDN TTL (5 minutes) so it picks up image URL changes quickly. The images themselves are immutable and cached forever.