API Design & Idempotency

API design as an operational contract: resource modeling, REST vs RPC vs GraphQL, idempotency keys for safe retries, cursor pagination, versioning, error semantics, ETags, and retry storms.

26 min readupdated 2026-06-28

On this page

Most API design advice stops at the surface: use nouns, return JSON, version your routes, write an OpenAPI spec. None of that is wrong. All of it is the easy 20% of the job. The hard 80% — the part that actually pages people — is the contract you make about what happens when the network lies to you. And the network lies constantly.

Here is the uncomfortable truth that took me years and one genuinely awful weekend to internalize: an API is not a request-response function. It is a promise about behavior under failure. Clients will retry. Load balancers will drop connections mid-stream. A 200 OK you sent successfully will vanish into a dead TCP socket, and the caller — who has no way to tell “my request never arrived” from “the response never came back” — will assume the worst and try again. If your design only works on the happy path, it does not work. It just hasn’t been tested by reality yet.

This article is about designing for that world. The aesthetic stuff (URL shapes, REST vs RPC vs GraphQL) gets a fair treatment because it has real operational consequences, but the spine of the piece is idempotency: the discipline of making “this request was delivered more than once” a non-event instead of a double charge. Everything orbits that. Pagination, versioning, error semantics, and optimistic concurrency are all, at bottom, the same question asked from different angles — what is the safe thing to do when a client repeats itself or races another client?

I lean on a few sibling topics throughout. To stop abusive or runaway clients before they reach your handlers, pair this with rate limiting. The edge concerns — auth, TLS termination, request shaping — belong to the API gateway. The consistency vocabulary behind “exactly-once effect” comes from consistency & consensus. And when you wire all of this up, observability is what turns a 3am mystery into a five-minute fix.

A motivating failure

A marketplace I worked with ran a clean, conventional POST /orders endpoint. It validated the cart, reserved inventory, charged the card through a payment provider, and returned 201 Created with the order ID. It had been live for two years. It was, by every dashboard, healthy.

Then a Friday-evening sale doubled traffic. The downstream payment provider — not our code, theirs — started taking eight seconds to respond instead of two hundred milliseconds. Our mobile clients had a hard-coded five-second HTTP timeout. So here is the exact sequence that played out, thousands of times: the client fires POST /orders, the server reserves inventory and successfully charges the card, the provider’s slowness pushes the total round trip past five seconds, the client’s timeout fires and it shows “something went wrong, retry?”, the customer taps retry, and the whole thing runs again — new inventory reservation, second charge, second order.

The server did nothing wrong by its own logic. Each request was valid. Each charge succeeded. The handler had no idea the two requests were “the same” order, because nothing in the design said they could be. By the time support escalated, we had double-charged several thousand customers, oversold inventory we didn’t have, and triggered a wave of chargebacks that cost more than the sale earned. The fix was not a bug fix. There was no bug. The fix was admitting that a non-idempotent write behind a retrying client is a duplicate-effect generator, and it had been one all along — it just needed a slow dependency to reveal it.

We shipped idempotency keys the next week. The retry storm came back during the next sale. This time it was a non-event: retries replayed the stored 201 and nobody got charged twice. That is the difference between an API that survives reality and one that merely passes its tests.

The one-sentence mental model

An API is a typed, versioned contract over an unreliable transport, where every mutating call must define what happens when it arrives more than once.

Unpack each clause, because each one is an operational constraint you will eventually meet:

Typed → the schema is the source of truth, not the prose docs. If the schema and the docs disagree, clients believe the bytes on the wire, so the schema is what you actually shipped.
Versioned → you can never break a field a client depends on. Old clients live in the wild for years; the contract you published is load-bearing forever, or until you can prove the traffic is gone.
Unreliable transport → “I got no response” and “the operation didn’t happen” are different events that look identical to the caller. The caller cannot tell them apart, so it must assume the operation might have happened and retry safely.
Arrives more than once → idempotency is not a feature you add for nice-to-have robustness. For writes, it is the definition of correctness.

flowchart LR
  C[Client] -->|"req + Idem-Key"| GW[API Gateway]
  GW -->|"auth\nrate limit"| H[Handler]
  H -->|check key| K[(Idempotency\nstore)]
  K -->|"miss\nexecute once"| DB[(System\nof record)]
  K -->|"hit\nreplay saved"| C
  H -->|"resp + ETag"| C

The gateway terminates the cross-cutting concerns — auth, rate limiting, TLS. The handler owns the one thing nothing else can do for it: producing an exactly-once effect even though the transport only offers at-least-once delivery. That gap — at-least-once in, exactly-once out — is the entire job. Internalize it and most of the rest of API design is downstream bookkeeping.

How it actually works

Resource modeling

Model resources — nouns with stable identity and a lifecycle — and the state transitions on them, not procedures. A resource has a durable URL, a representation, and a defined set of legal transitions. POST /orders creates one, GET /orders/{id} reads it, PATCH /orders/{id} mutates it, DELETE /orders/{id} retires it.

This isn’t aesthetic preference. The HTTP method is a contract, and the contract is what tooling, caches, proxies, and clients reason about automatically. GET, PUT, and DELETE are defined by the spec as idempotent: calling them N times leaves the server in the same state as calling them once. POST is explicitly not. PATCH usually isn’t. That single distinction decides where you need extra machinery and where you get safety for free.

Method	Idempotent?	Safe (no effect)?	Typical use
`GET`	yes	yes	read a representation
`PUT`	yes	no	full replace at a known URL
`DELETE`	yes	no	remove (2nd call → `404`/`204`)
`POST`	no	no	create, “do this action”
`PATCH`	usually no	no	partial update

The practical rule falls right out of the table: if a method is not idempotent by spec but the client will retry it, you have to make it idempotent yourself. There is no third option. You either pick a method whose semantics give you idempotency, restructure the operation so it’s naturally idempotent (e.g. PUT to a client-chosen ID), or bolt on idempotency keys. Pretending the retry won’t happen is not on the menu.

Idempotency keys for safe retries

The mechanism is simple to state and full of traps to implement. The client generates a unique key — a UUID — per logical operation (not per HTTP attempt) and sends it as a header: Idempotency-Key: 9f1c4e.... On first receipt, the server records the key, executes the operation exactly once, and stores the response against the key. On any later request carrying the same key, the server returns the stored response without re-executing.

sequenceDiagram
  participant Client
  participant Server
  participant Store as Idem store
  participant Ledger
  Client->>Server: POST /payments (K1)
  Server->>Store: INSERT K1 in_progress
  Store-->>Server: ok first time
  Server->>Ledger: charge 40
  Ledger-->>Server: charge C9
  Server->>Store: save 201 for K1
  Server-->>Client: 201 charge C9
  Note over Client,Server: response lost in transit
  Client->>Server: POST /payments (K1) retry
  Server->>Store: INSERT K1 conflict
  Server-->>Client: 201 charge C9 replayed

The idea is trivial. The correctness lives entirely in the details that bite:

The uniqueness constraint must be enforced by the database, not by app code. The naive implementation is SELECT the key, and if it’s absent, execute and INSERT. Two concurrent retries both run the SELECT, both see nothing, both execute, both charge. You have built a race condition that looks like idempotency. The correct version pushes the race down to the storage engine with INSERT ... ON CONFLICT DO NOTHING or a unique index, so the database — the one component that can actually serialize the two writers — decides who wins.
Persist the key in the same transaction as the effect. If you charge the ledger and then crash before recording the key, the retry charges again, and you’re back to the opening story with extra steps. The key write and the side effect must commit atomically. When the effect lives in an external system you can’t transact with (a payment provider), you make the external call idempotent too — pass your key through to a provider that supports idempotency, so the duplicate collapses on their side as well.
Handle the in-progress race explicitly. A retry can arrive while the first execution is still running. Returning a fresh execution is wrong; returning a stale empty response is wrong. The clean answer is a three-state record — in_progress, completed, failed — and a retry that hits in_progress gets a 409 Conflict (or blocks briefly and then reads the result). This is where most home-grown implementations are subtly broken.

flowchart TD
  R["request\nwith key"] --> L{"key\nexists?"}
  L -->|no| I[insert in_progress]
  I --> X[execute once]
  X --> S[store completed]
  S --> RESP[return result]
  L -->|completed| RESP
  L -->|in_progress| C409[return 409]
  L -->|failed| RETRY[allow re-exec]

Scope and expire keys. Scope per (account, endpoint) so two unrelated callers who happen to generate colliding UUIDs — or a malicious caller replaying someone else’s key — can’t cross streams. And give keys a TTL (commonly 24h to 7d) so the store doesn’t grow without bound. Match the TTL to how long a client could plausibly still be retrying; a mobile client backgrounded for a day is real.
Bind the key to the request body. A nasty edge case: client sends key K1 with a $40 charge, times out, then a buggy retry sends K1 with a $50 charge. If you blindly replay, you return the $40 response for a $50 request — silent corruption. Store a hash of the request and return 422 if a reused key arrives with a different payload.

Pagination

Offset pagination (?limit=50&offset=10000) is the default everyone reaches for and the one that quietly rots. The database still has to walk and discard the first 10,000 rows before returning anything, so page depth and latency grow linearly — deep pages get slower forever. Worse, if rows are inserted or deleted while a client pages through, the offsets shift underneath them and they silently skip or duplicate records.

Cursor (keyset) pagination is the durable answer. You return an opaque cursor encoding the last-seen sort key, and the next page is a range scan from there:

SELECT * FROM orders
WHERE (created_at, id) > (:last_ts, :last_id)
ORDER BY created_at, id
LIMIT 50;

That is an index range scan — O(limit), independent of how deep you are — and it’s stable under concurrent writes because the cursor names a position in the data, not a count of skipped rows. This is the same access-pattern discipline that database indexing rewards: design the query so the index does the seeking.

Strategy	Deep-page cost	Stable under writes?	Random access?
Offset/limit	`O(offset)`	no (skips/dupes)	yes (jump to page N)
Cursor/keyset	`O(limit)`	yes	no (next/prev only)

Always emit cursors as opaque tokens (next_cursor: "eyJ0cyI6..."), never raw offsets or exposed column values. Opaque means you can change the underlying scheme — switch sort columns, move to a different store — without breaking a single client. The moment a client parses your cursor, it’s part of your contract.

Versioning

Versioning exists for one reason: to let you evolve the contract without a flag day where every client must upgrade at once. The options, ranked by how much I trust them in production:

Additive, non-breaking change — add fields, never remove or repurpose them; clients ignore what they don’t recognize. This avoids versioning entirely and should be your default. The overwhelming majority of “we need v2” conversations are really “we were sloppy about additive design.”
URL version (/v1/orders → /v2/orders) — coarse, visible, trivially routable and cacheable. The right tool for a genuinely incompatible redesign, and the easiest to reason about operationally because the version is right there in the access logs.
Header / media-type version (Accept: application/vnd.api.v2+json) — keeps URLs clean, but it’s harder to debug from a browser or a log line, and some intermediary caches handle it poorly.

Whichever you choose, the iron rule is identical: never silently change the meaning of an existing field. Renaming the allowed values of a status enum, tightening a previously-optional field to required, or changing units (cents to dollars) is a breaking change even when the JSON shape is byte-for-byte identical. Old clients keep parsing successfully and quietly do the wrong thing — the worst kind of break, because nothing errors.

Error semantics

Status codes are the first thing every client branches on, so the classes have to be right even if individual codes are debatable:

4xx — the client must change something before retrying. A blind, unmodified retry of a 400 or 422 is a client bug.
429 / 503 — retryable, but only with backoff. Always send Retry-After so well-behaved clients pace themselves instead of hammering you into a deeper hole.
5xx — the server’s fault; safe to retry only if the operation is idempotent. This is precisely why idempotency and error design are the same conversation.

Return a structured body, not a bare string. A stable machine-readable code, a human-readable message, and a request_id that resolves directly in your logs (observability is what makes that request_id worth printing). RFC 9457 application/problem+json is a perfectly good shape to adopt rather than invent your own.

The subtle distinction that matters most: separate “definitely failed” (422 validation, 409 conflict) from “unknown outcome” (504 gateway timeout, 503). The first tells the client don’t retry, fix the input. The second is the exact situation the idempotency key was built for — the client doesn’t know if the write landed, so it retries with the same key and lets the server deduplicate.

Optimistic concurrency with ETags

Lost updates happen when two clients read the same resource, both edit their local copy, and the second write silently clobbers the first. Optimistic concurrency prevents this without holding locks. The server returns an ETag (a version number or content hash) on GET; the client echoes it back on write via If-Match. If the resource changed in the meantime, the server rejects the write with 412 Precondition Failed, and the client re-reads and retries on top of the new state.

GET  /orders/42                          -> 200  ETag: "v7"
PUT  /orders/42   If-Match: "v7"         -> 200  ETag: "v8"
PUT  /orders/42   If-Match: "v7"         -> 412 Precondition Failed

This is the read-side mirror of idempotency keys. Keys stop duplicate writes; ETags stop conflicting writes. The same ETag mechanism also powers caching: If-None-Match on a GET lets the server answer 304 Not Modified and skip the body transfer entirely, which is most of what makes HTTP caching at a CDN cheap. One caveat I’ve been burned by — generate the ETag from a canonical, stable representation. If your serialization includes volatile fields like a server timestamp or unordered map keys, every GET returns a fresh ETag, If-Match never matches, and you’ve built a 412 machine.

The tradeoffs that bite

The decisions that look free at design time and quietly bill you later:

POST for everything. Convenient, “RESTful enough,” and it throws away HTTP idempotency and cacheability in one move. Now every write needs custom dedup machinery that the method would have given you for a PUT.
Idempotency keys as optional. They get skipped under deadline pressure on “low-risk” endpoints, and you find out which endpoints were actually risky from a duplicate-charge ticket.
Offset pagination. Flawless in the demo with 200 rows; a multi-second table scan that pins a database connection at ?offset=2000000 in production.
Synchronous everything. Modeling a long-running job (video transcode, bulk import) as one blocking request guarantees client timeouts and the retry storm that follows. The fix is to return 202 Accepted with a status URL and let the work happen behind a message queue.

REST vs RPC vs GraphQL

There is no universally correct style — there is a correct style for your coupling and traffic shape.

Dimension	REST	RPC (gRPC)	GraphQL
Shape	resources + verbs	typed methods	one endpoint, client query
Over/under-fetch	common	tight (proto)	client picks fields
Caching	HTTP-native	manual	hard (POST, dynamic)
Best fit	public, CRUD, CDN	internal service-to-service	many UIs, many backends
Main hazard	chatty, N+1 hops	tight coupling	one query DoSing the DB

REST is the default for public, cacheable, resource-shaped APIs — it gets HTTP semantics, CDN caching, and ETags for free. gRPC wins inside the fleet where you control both ends, want strict contracts and binary efficiency, and benefit from streaming; it pairs naturally with the service-to-service traffic that sits behind a load balancer and an API gateway. GraphQL earns its considerable complexity when many clients with genuinely different field needs hit many backends — but it imports a sharp new failure mode: a single unbounded query can fan out into thousands of database calls, so you need query depth limits, complexity budgets, and persisted queries before you expose it publicly.

The honest meta-point: the style is a smaller decision than the contract discipline. A well-versioned, idempotent, properly-paginated gRPC API and the equivalent REST API are both fine. A sloppy version of either will page you.

Performance and the cost of a request

API performance is dominated by two things people underestimate: round trips and fan-out. A single logical user action that requires the client to make five sequential calls (each GET depending on the last) pays five times the network latency before anything renders — and on a high-latency mobile network, that’s the whole performance budget gone. This is the “chatty API” anti-pattern, and it’s why REST sometimes loses to a single GraphQL query or a purpose-built aggregation endpoint: not because REST is slow, but because N round trips are slow.

The levers that actually move the needle, in rough order of impact:

Collapse round trips. Return the data a screen needs in one response. Use ?include= expansion, compound documents, or a backend-for-frontend aggregation layer rather than forcing the client to stitch ten resources together. Every eliminated round trip is a full RTT saved.
Cache at the edge. A strong ETag plus Cache-Control: max-age lets a CDN serve 304s and cached bodies, absorbing the bulk of read traffic before it ever reaches your origin. For read-heavy public APIs this is the single largest win, and it leans on the same patterns as caching strategies.
Keep the idempotency-store lookup cheap. Every mutating request now does a keyed read before it executes. If that store is slow, you’ve added latency to every write. Back it with a fast keyed store (Redis with TTLs, or a well-indexed table) and make the lookup a single point-query, not a scan.
Paginate by cursor. Covered above, but it’s a performance lever too: O(limit) instead of O(offset) is the difference between a list endpoint with flat p99 and one whose p99 climbs with page depth until it times out.
Compress and right-size payloads. gzip/brotli on responses, and don’t return 200 fields when the client uses 8. Over-fetching is bandwidth and serialization cost on every single call.

Measure the right thing. Average latency hides the tail; you care about p99 and p99.9, because that tail is exactly where client timeouts fire and the retry storm is born. Watch latency per endpoint and per version, error rate by status class, and the idempotency store’s hit rate (a sudden spike in replays is your early warning that a downstream dependency is timing out clients). Wire all of it through observability with a request_id that follows a call across services.

Failure modes

The signature API failure is the retry storm into non-idempotent writes. A dependency slows down, requests cross the client timeout, every client retries, and now you have 3x the write load on an already-struggling system — and if those writes aren’t idempotent, 3x the side effects. Duplicate orders, double charges, doubled inventory decrements. The load spike and the correctness bug arrive together, which is what makes it so vicious: you’re firefighting capacity and data corruption at the same moment.

flowchart TD
  D[dependency\nslows down] --> T[requests cross\nclient timeout]
  T --> R[clients retry]
  R --> L[3x write load]
  L --> D
  L --> E{writes\nidempotent?}
  E -->|yes| OK[replay saved\nno dup effect]
  E -->|no| BAD[duplicate\ncharges, orders]

If a mutating endpoint has no idempotency contract and clients retry on timeout, you do not have an API — you have a duplicate-effect generator waiting for a slow dependency. The dependency will eventually be slow. The only question is whether you find out from a load test or from a customer’s bank statement.

The other recurring breakages, each symptom → root cause → prevention:

Read-then-write idempotency race. Symptom: duplicate effects despite “having idempotency.” Root cause: the dedup check is a SELECT in application code, and two concurrent retries both pass it. Prevention: enforce uniqueness with a DB constraint (INSERT ON CONFLICT / unique index), never an app-level read.
Offset pagination death. Symptom: p99 on a list endpoint climbs with page depth; one ?offset=2000000 request pins a connection for seconds. Root cause: the database walks and discards every skipped row. Prevention: cursor/keyset pagination everywhere; never expose raw offsets.
Silent breaking change. Symptom: old clients quietly misbehave, no errors anywhere. Root cause: a field’s meaning changed (enum values, units, optionality) while its shape stayed the same. Prevention: treat field-meaning changes as breaking; ship them only behind a new version; track per-version traffic before deprecating.
Cold/expensive-endpoint stampede. Symptom: a popular key expires or an expensive report endpoint gets a traffic spike and the origin falls over. Root cause: no admission control at the edge. Prevention: handle it with rate limiting and 429 Retry-After, plus request coalescing — don’t try to fix it in the handler.
ETag drift. Symptom: If-Match always returns 412; clients can never write. Root cause: the ETag is computed from a representation that includes volatile fields. Prevention: hash a canonical, stable serialization with volatile fields stripped.
Lost-update clobber. Symptom: a user’s edit silently disappears. Root cause: concurrent read-modify-write with no concurrency control. Prevention: optimistic concurrency via ETag + If-Match → 412.

Scaling it

At 10x, the contract holds but the implementation strains, so you push cross-cutting work outward. Move auth, rate limiting, request validation, and TLS termination to the API gateway so your handlers stay thin and focused on the exactly-once effect. The idempotency store becomes a hot path — every write touches it — so back it with a fast keyed store and make sure its write sits on the same transaction boundary as the side effect, not in a separate “best effort” step.

At 100x, the idempotency store itself must partition. Key it by a hash of the idempotency key so that a retry deterministically lands on the same shard as the original request — otherwise the dedup guarantee evaporates the moment the two attempts route to different partitions. This is exactly the partition-key design discipline from sharding & partitioning and the routing math of consistent hashing: related operations must colocate. Cursor pagination becomes mandatory everywhere, because any surviving offset endpoint is now the slow query that pages you. And HTTP caching does real load-shedding — a strong ETag plus Cache-Control lets a CDN serve the long tail of reads as 304s before they touch your origin.

Versioning at scale means running vN and vN+1 side by side for a long deprecation window, with per-version metrics so you can prove the old version’s traffic has actually drained to zero before you delete the code. “We think nobody uses v1 anymore” is how you break an enterprise customer’s nightly batch job that runs once a quarter. Measure, don’t assume.

Long-running work doesn’t scale as synchronous requests at all. Convert it to an async pattern: accept the request with 202 Accepted, hand the work to a message queue or a task system like Celery, return a status URL the client polls or a webhook you call back. The synchronous request stays fast and within timeout; the heavy lifting happens where it can be retried and load-balanced independently.

When to reach for it (and when not to)

This isn’t “should I build an API” — it’s which conventions to actually commit to versus skip.

Reach for strict idempotency keys on any non-idempotent write a client will retry: payments, order creation, sending a message, provisioning a resource, anything that moves money or inventory. If a duplicate would cost real money or corrupt state, it is non-negotiable, full stop.

Reach for ETags / optimistic concurrency when multiple clients can edit the same resource and lost updates matter — user profiles, documents, configuration objects, anything with a “last edit wins” hazard.

Reach for cursor pagination on every list endpoint that can grow unbounded. The cost over offset pagination is near-zero at small scale and enormous savings at large scale, so just do it from the start.

Don’t add idempotency-key machinery to operations that are naturally idempotent — a PUT to a known URL or a DELETE already guarantees it by spec; bolting on keys is pure ceremony. Don’t reach for GraphQL to solve over-fetching on a small, stable API; additive REST fields are far simpler and you skip the whole query-cost-control problem. Don’t version on day one for changes you could ship additively; most “v2” rewrites I’ve seen were avoidable with a little discipline about adding rather than mutating fields.

When to consider alternatives

When the synchronous request/response shape is the wrong tool, the right tool is usually a sibling:

Long-running or fire-and-forget work → a message queue or Celery, with the API returning 202 and a status URL.
High-fan-out event distribution / streaming backbone → Kafka, rather than fanning out synchronous calls.
Internal, latency-sensitive service-to-service calls → gRPC behind an API gateway and load balancer, not a chatty public-style REST hop.
Bulk export / large object delivery → a presigned URL to object storage, not a streaming response through your API.
Search and relevance ranking over a list endpoint → Elasticsearch as a query layer, not hand-rolled filtering on a paginated SQL endpoint.
The durable home for whatever your API writes → PostgreSQL or DynamoDB; the API is the contract, the database is the truth.

Operational checklist

Require Idempotency-Key on every non-idempotent money/state-moving endpoint; enforce uniqueness with a DB constraint, never an app-level read-then-write.
Commit the idempotency key and the side effect in the same transaction; bind the key to a request-body hash; TTL keys (24h–7d) and scope per (account, endpoint).
Return a three-state idempotency record (in_progress/completed/failed) and answer in-flight retries with 409, not a fresh execution.
Return Retry-After on every 429/503; document precisely which status codes are safe to retry.
Use cursor/keyset pagination on every list endpoint; emit opaque cursors; never expose raw offsets.
Emit ETag on mutable resources and honor If-Match → 412; hash a canonical representation with volatile fields stripped.
Ship a structured error body (code, message, request_id) that resolves directly in your logs; wire it through observability.
Treat any change to a field’s meaning as breaking; ship only additive changes within a version; track per-version traffic before deprecating anything.
Convert long-running work to 202 Accepted + status URL behind a message queue; never block a request past the client timeout.

Summary

API design is not URL aesthetics; it is the contract you make about behavior when the transport fails — and it always eventually fails. The load-bearing insight is the gap between at-least-once delivery and exactly-once effect: clients retry because they cannot distinguish “no response” from “didn’t happen,” so every non-idempotent write needs an idempotency contract or it’s a duplicate-effect generator waiting for a slow dependency. Build idempotency on a database uniqueness constraint, commit it atomically with the side effect, and bind it to the request body. Around that core, the rest is consistency discipline: cursor pagination so deep pages don’t table-scan, additive versioning so old clients never silently break, error classes that tell a client whether to retry, and ETags so concurrent edits conflict loudly instead of clobbering quietly. Get those right and your API is boring under load — which, when a downstream dependency goes slow at the worst possible moment, is exactly what you want it to be.

Appendix: HTTP idempotency vs safety

Two spec terms get conflated constantly, and the distinction matters:

Safe — the method has no side effects on the server; it’s read-only. GET and HEAD are safe. A safe method can be retried, prefetched, and cached freely.
Idempotent — the method may have side effects, but performing it N times has the same server-state result as performing it once. GET, HEAD, PUT, and DELETE are idempotent. POST and PATCH are not.

Every safe method is idempotent, but not every idempotent method is safe — DELETE changes state (so it’s not safe) yet deleting twice lands in the same place (so it is idempotent). The reason this matters operationally: intermediaries (browsers, proxies, client libraries) will automatically retry idempotent methods on a network error, and they’re allowed to, because the spec promises it’s safe to do so. They will not auto-retry POST. So when you choose POST for an operation that a flaky client should be able to retry, you’ve opted out of the free, spec-blessed retry behavior and signed up to provide the idempotency guarantee yourself — which is the entire reason idempotency keys exist.

Incidents & deep-dives

Where this system breaks in production — and how it comes back.

Documenting next

🔒 Double Charges: Idempotency Keys Done Wrongroadmap →