- What's the core payment flow? โ PaymentIntent API: merchant creates a PaymentIntent with amount + currency, client-side collects card details and confirms, Stripe authorizes with card network, then captures. Two-step (auth + capture) or one-step (immediate charge).
- Which payment methods? โ Cards (Visa, Mastercard, Amex), bank debits (ACH, SEPA), wallets (Apple Pay, Google Pay), BNPL (Klarna, Affirm). Cards are the primary flow.
- PCI compliance? โ Critical. Raw card numbers (PAN) never touch merchant servers. Stripe.js collects card details client-side, tokenizes directly with Stripe's PCI-certified vault. Merchants only see tokens.
- Settlement? โ Stripe holds funds after capture, batches payouts to merchant bank accounts (T+2 standard). Multi-currency: accept in EUR, settle in USD with FX conversion.
- Fraud detection? โ Stripe Radar: ML model scoring every transaction in real-time. Rules engine for merchant-specific policies. 3D Secure for high-risk transactions (SCA/PSD2 in Europe).
- Exactly-once semantics? โ Idempotency keys on every mutating API call. Client retries with same key โ server returns cached result. No double charges, ever.
| In Scope | Out of Scope |
|---|---|
| PaymentIntent lifecycle: create โ confirm โ authorize โ capture | Stripe Connect (marketplace payouts) |
| Tokenization vault (PCI-DSS Level 1) | Stripe Billing / Subscriptions engine |
| Double-entry ledger & settlement | Stripe Terminal (in-person POS) |
| Fraud detection (Radar ML + rules) | Issuing (Stripe-issued cards) |
| Idempotency & exactly-once processing | Treasury / Banking-as-a-service |
| Webhook delivery for async events | Tax calculation engine |
| Multi-currency with FX conversion | Identity verification (Stripe Identity) |
- UC1: One-time card payment โ Customer buys a $49.99 item. Merchant creates PaymentIntent, customer enters card in Stripe Elements, Stripe authorizes + captures. Merchant receives webhook
payment_intent.succeeded. Funds settle T+2. - UC2: Auth + capture (hotel/rental) โ Hotel authorizes $500 at check-in. At checkout, captures actual amount $420. Difference released. Auth expires after 7 days.
- UC3: Refund โ Merchant issues full or partial refund. Stripe reverses ledger entry, submits refund to card network, customer's bank credits 5-10 business days.
- UC4: 3D Secure โ European payment triggers SCA. Customer redirected to bank's 3DS page, authenticates, payment completes. On failure, PaymentIntent enters
requires_action. - UC5: Failed payment with retry โ Card declined. PaymentIntent moves to
requires_payment_method. Customer retries with different card.
- Correctness above all: Money cannot be lost, duplicated, or misattributed. Financial invariant: total debits = total credits at all times. A missing cent triggers an audit failure.
- Idempotency: Every mutating API call safely retriable. Timeout during $10,000 charge must never result in two charges. Keys retained 24 hours.
- Low latency: Authorization in <2 seconds (customer waiting at checkout). Card network round-trip + fraud check + Stripe overhead.
- High availability: 99.999% for the payment path. Downtime = lost revenue for every merchant. Multi-region active-active.
- PCI-DSS Level 1: Raw card data only in isolated tokenization vault. Own network segment, separate access controls, annual audit.
- Multi-tenancy: Millions of merchants on shared infrastructure. One merchant's traffic spike must not degrade others.
| Requirement | Decision | Why | Consistency |
|---|---|---|---|
| Card data security (PCI-DSS L1) | Isolated tokenization vault | Raw PANs in separate hardened service with own network segment. All other systems use opaque tokens. Reduces PCI scope from entire platform to one service. | CP |
| No double charges on retry | Idempotency keys (24h TTL in Redis) | Client sends unique key per request. Server checks before processing, returns cached result on duplicate. Without: timeout โ retry โ double charge. | CP |
| Financial correctness | Double-entry ledger (PostgreSQL) | Every movement: debit + credit. Invariant: sum debits = sum credits. Enables point-in-time reconstruction. Simple balance updates lose audit trail. | CP (ACID) |
| Real-time fraud scoring | ML pipeline (Radar) inline on auth | Must complete in <200ms within auth latency budget. Cross-merchant signals. Async detection lets fraudulent charges through. | โ |
| Complex payment lifecycle | PaymentIntent state machine | Explicit states (created โ processing โ succeeded). Each transition atomic. Recoverable after crashes. Ad-hoc flags create inconsistent states. | CP |
| Async merchant notification | Webhooks (at-least-once, 72h retry) | Merchants need to know when payments succeed/fail/dispute. Retry with exponential backoff. Polling wastes API calls and delays. | Eventual |
| State | Meaning | Transitions |
|---|---|---|
| requires_payment_method | Created, waiting for card/bank details | โ requires_confirmation |
| requires_confirmation | Method attached, awaiting merchant confirm | โ requires_action | processing |
| requires_action | Customer must complete 3DS/redirect | โ processing | requires_payment_method |
| processing | Authorizing with card network | โ requires_capture | succeeded |
| requires_capture | Auth approved, funds held. Must capture. | โ succeeded | canceled |
| succeeded | Payment complete. Funds captured. | (terminal) |
| canceled | Canceled by merchant or system | (terminal) |
| Component | What | Latency |
|---|---|---|
| Features | ~200 signals: card fingerprint, IP, velocity, device, cross-merchant patterns | ~20ms |
| ML Model | Gradient-boosted tree on billions of historical transactions. Outputs risk 0.0โ1.0. Retrained daily. | ~30ms |
| Rules | Merchant-defined: "Block if amount >$5K and country!=US." Override or supplement ML. | ~10ms |
| 3DS Decision | Medium risk (0.3-0.75): trigger 3D Secure. Shifts liability to issuing bank. Required by PSD2/SCA. | ~5ms |
| Verdict | ALLOW (<0.3) ยท BLOCK (>0.75) ยท REVIEW (0.3-0.75: 3DS or queue) | ~5ms |
| Store | Technology | What & Why |
|---|---|---|
| Ledger | PostgreSQL (ACID) | Double-entry records. ACID non-negotiable. Sharded by merchant_id. |
| Payment DB | PostgreSQL (sharded) | PaymentIntents, Charges, Refunds, Customers. Sharded by merchant_id. |
| Token Vault | Isolated PG + HSM | Encrypted PANs. Separate network, access controls, annual PCI audit. |
| Idempotency | Redis Cluster | Keys โ cached responses. 24h TTL. ~50GB. Sub-ms lookup. |
| Event Bus | Kafka | State transitions as events. Consumers: webhooks, settlement, analytics. |
| Fraud Features | Redis + feature store | Pre-computed fraud signals. Real-time updates from transaction stream. |
- Intelligent payment routing: ML-predicted approval rate per route (network + processor + region). 1% approval improvement = billions in recovered revenue across the platform.
- Instant Payouts: Push funds to merchant debit card in minutes via RTP/Visa Direct instead of T+2. Premium feature.
- Adaptive 3DS: Dynamically trigger 3DS based on issuer behavior and risk. Maximize conversion while meeting SCA requirements.
- Network tokenization: Card-network-issued tokens (Visa Token Service). Higher approval rates, survives card reissuance.
- Multi-processor failover: Auto-route to secondary processor on primary outage. Increases auth path availability to 99.999%+.
- AI dispute management: Auto-generate chargeback evidence packages. ML predicts win probability and recommends strategy.
What happens if the card network approves but Stripe's DB write fails?
The most dangerous failure in payments. Three defenses: (1) Write-ahead log: before sending auth to the card network, log a "pending auth" to Kafka/WAL. If DB fails, recovery reads the log. (2) Nightly reconciliation: card networks send settlement files listing all approved transactions. A reconciliation job compares against the ledger. Any mismatch triggers an alert and correction entry. (3) Void on failure: if DB write fails immediately, attempt to void the auth (release the hold). If void also fails, fall back to reconciliation. This is why payments engineers obsess about operation ordering: log first, authorize second, persist third. Every step has a recovery path.
How does tokenization work? What's in the vault?
Stripe.js runs in the customer's browser and sends card details directly to the vault โ never touching the merchant's server. The vault encrypts the PAN using HSM-managed keys, stores the encrypted PAN, and returns an opaque token (pm_card_visa). The token has no mathematical relationship to the PAN โ you cannot reverse it. When Stripe needs the actual PAN for authorization, the Network Gateway calls the vault's internal API, authenticates, receives the decrypted PAN for that specific transaction, uses it for the card network call, and immediately discards it from memory โ never written to logs. The vault is physically isolated: separate network segment, separate databases, separate access control lists, independently audited annually for PCI-DSS Level 1. This reduces PCI scope from the entire platform to one service.
How do you handle Black Friday โ 10x normal volume?
5K TPS peak isn't a throughput challenge โ it's correctness-under-load. (1) Horizontal scaling of stateless services: API Gateway, Payment Service, Fraud Engine scale behind load balancers. (2) Pre-provisioned capacity based on historical patterns and merchant forecasts. (3) Per-merchant rate limiting: one merchant's flash sale doesn't degrade others. (4) Async offloading: only auth is latency-critical. Webhooks, settlement, analytics go through Kafka at their own pace โ Kafka absorbs the write spike. (5) The real bottleneck is the card networks. Visa handles 65K TPS globally. If every processor spikes simultaneously, networks constrain. Stripe can queue and retry, but card network latency may increase under global peak load.
Why double-entry bookkeeping instead of just updating balances?
Three non-negotiable reasons: (1) Auditability: regulators require reconstructing any balance at any point in time. Mutable balances only show current state. Double-entry replays from zero and proves every cent. Legally required. (2) Error detection: the invariant sum(debits) = sum(credits) is verified with a single query. Off by one cent? Something is broken. Mutable balances drift silently until a merchant complains. (3) Concurrency: balance-update requires locking the balance row โ contention. Double-entry appends independent entries โ no shared row, no lock contention on the hot path. The tradeoff: computing current balance requires summing entries (or maintaining a materialized view updated asynchronously). Worth it for correctness.
Why is cross-merchant fraud detection Stripe's competitive moat?
Radar sees every transaction across millions of merchants. A stolen card flagged at Merchant A can be blocked at Merchant B in real-time โ before B ever sees the charge. Standalone fraud tools only see one merchant's data. Stripe sees the global transaction graph: which cards appear at which merchants, velocity across the network, device fingerprints at multiple merchants, emails linked to past fraud. This compounds: more merchants โ more data โ better model โ lower fraud โ attracts more merchants. A new merchant on Day 1 gets protection trained on billions of historical transactions. They'd need years of solo data to match that. This is why Radar's false positive rate is dramatically lower than standalone tools โ more data means more precision in distinguishing fraud from legitimate edge cases.