Redis

A production tour of Redis: the single-threaded event loop, data structures, RDB/AOF durability, async replication and lost writes, eviction, hot keys, Cluster, and where it breaks.

21 min readupdated 2026-06-28

On this page

Redis gets introduced as “a fast cache.” That framing survives exactly until the first time it falls over and takes your checkout flow with it. Redis is a single-threaded, in-memory data structure server with optional persistence and asynchronous replication. Every clause in that sentence is an operational constraint you will eventually meet at 3am, and most Redis incidents are just one of those clauses collecting its debt.

The thing that makes Redis special is also the thing that makes it dangerous: it is gloriously simple. There is no query planner, no buffer manager fighting the OS page cache, no MVCC bloat, no per-connection process. One core, one command at a time, data in RAM. That simplicity buys you sub-millisecond latency and atomic operations for free. It also means the failure modes are blunt — when Redis goes wrong it tends to go wrong all at once, for every client, instantly.

This is the long-form context article: enough of a mental model to reason about Redis under real load, not just under a demo. It assumes you have run something in production and been surprised by it. The patterns Redis powers show up in Caching Strategies and Rate Limiting; the consistency tradeoffs it punts on are covered in Consistency & Consensus and CAP Theorem & Tradeoffs; and when you outgrow it as a log, you land on Kafka. If you want the relational counterpoint to almost everything here, read PostgreSQL.

The single biggest mistake teams make is treating Redis like a smaller, faster database. It is not. It is a different shape of tool with a different durability contract, and the gap between “acknowledged” and “durable” is where careers get interesting.

A motivating failure

A fintech team uses Redis as a distributed lock to make a payment idempotent. The flow is sound on paper: before charging a card, a worker does SET lock:charge:{order} <token> NX PX 30000, runs the charge, then releases the lock. NX means only one worker can hold it. They tested it for months. It never double-charged in staging.

Then the primary Redis node has a hardware blip and Sentinel promotes a replica. Replication is asynchronous, so the replica is a few hundred milliseconds behind. At the instant of failover, several lock:charge:* keys that existed on the old primary had not yet replicated. On the new primary, those locks simply do not exist.

A worker that already held a lock keeps processing its charge against the old primary’s view of the world. A second worker, now talking to the freshly-promoted primary, sees no lock, acquires it cleanly, and charges the same card again. No error. No alert. Two charges, one order, on the same card, separated by 800 milliseconds.

The bug was not in the lock code. SET NX PX did exactly what it promised on a single node. The failure lived in an assumption nobody wrote down: that a key acknowledged by the primary is safe across a failover. With async replication it is not — the acknowledgment and the durability are separate events, and a failover lands precisely in the gap between them. The team learned, the expensive way, that Redis locks are a performance optimization, not a correctness guarantee, and that anything requiring real mutual exclusion across failures needs a consensus system underneath (ZooKeeper or a fencing token from a real source of truth).

The one-sentence mental model

Redis runs your commands one at a time, on one core, against data that lives in RAM, and only writes to disk and replicas after it has already told you “OK.”

It is fast because it never waits — not for disk, not for locks, not for other cores. The flip side is that a single slow command (KEYS *, a giant SMEMBERS, a Lua script that loops a million times) stalls everything, because there is no second thread to pick up the slack. “Single-threaded” is not a performance footnote; it is the scheduling model you design around.

flowchart LR
  C1[Client A] --> Q[Socket buffers]
  C2[Client B] --> Q
  C3[Client C] --> Q
  Q --> E[Event loop\none cmd\nat a time]
  E --> M[(In-memory\ndataset)]
  M -. fork .-> R[(RDB / AOF\non disk)]
  E -. async .-> RP[(Replica\nstream)]

Unpack the sentence clause by clause, because each is a constraint:

One command at a time → atomicity is free (INCR, SETNX, MULTI/EXEC, Lua all run with nothing else interleaved), but a slow command blocks every other client for its full duration.
On one core → vertical scaling tops out at one CPU’s worth of command execution. A 64-core box does not make a single Redis instance faster at running commands.
Data lives in RAM → your dataset must fit in memory, plus headroom, and when it doesn’t you hit a cliff, not a gentle slope.
Writes to disk and replicas after the OK → the acknowledgment is not a durability guarantee. The window between “OK” and “persisted/replicated” is configurable but never zero.

Modern Redis (6.0+) added io-threads to parallelize network read/write across cores, and Redis 7+ keeps refining it. That helps a connection-bound workload move bytes faster. But command execution stays single-threaded — the I/O threads only handle parsing and socket writes. The atomicity you rely on comes directly from that single execution thread. Nothing else runs while your command does.

How it actually works

Data structures, not just strings

The reason to reach for Redis over a plain GET/SET cache is the typed commands. They push computation into the data store, so you move less data over the wire and get atomicity for free. Picking the wrong structure is how an O(1) server-side operation becomes “ship a megabyte to the client and recompute it.”

Strings — counters (INCR), feature flags, serialized blobs. INCR is atomic; never do read-modify-write on a counter from the app.
Hashes — objects with fields (HSET user:42 name ada plan pro). Cheaper than N separate keys when the fields share a lifetime, and small hashes get a compact memory encoding.
Sorted sets (ZSET) — leaderboards, sliding-window rate limiters, time-ordered queues. O(log n) inserts, range queries by score. The workhorse for anything ranked or time-bounded.
Sets — uniqueness, tags, “have I already seen this id,” set intersection for simple recommendations.
Streams — append-only logs with consumer groups, acks, and replay. The closest Redis gets to a broker, good at modest scale before you graduate to Kafka.
Bitmaps / HyperLogLog — presence and cardinality at a fraction of the memory. HLL counts millions of uniques in ~12 KB with a small error bound.

A concrete example of the leverage: a sliding-window rate limiter in a ZSET is one ZADD, one ZREMRANGEBYSCORE, one ZCARD, wrapped in a Lua script so the whole thing is atomic. Done client-side it would be a race condition waiting to happen. This is the pattern behind most production rate limiting.

The write path and where “OK” comes from

sequenceDiagram
  participant App
  participant Redis
  participant AOF as AOF buffer
  participant Disk
  App->>Redis: SET k v
  Redis->>Redis: apply in memory
  Redis-->>App: OK
  Redis->>AOF: append command
  AOF->>Disk: fsync per policy
  Note over AOF,Disk: durability window\n= fsync interval

A write is acknowledged the instant it lands in memory. Everything durable happens after: the command is appended to an in-memory AOF buffer, and that buffer is flushed to disk on a schedule governed by appendfsync. So the “OK” your client sees and the bytes-on-disk are two distinct events separated by a window you choose. Most people never think about that window until a node dies inside it.

Persistence: RDB vs AOF

People assume Redis is durable because it can write to disk. It is durable only as much as you configure it to be, and the defaults favor speed.

Mechanism	What it does	Failure cost	Restart speed
RDB	Point-in-time snapshot via `fork()`	Lose everything since the last snapshot	Fast (load one file)
AOF	Append every write to a log; `appendfsync everysec` default	~1s of writes on crash	Slow (replay the log)
Both	AOF for the durability window, RDB for fast restarts	~1s window, best of both	Fast (RDB + tail of AOF)

The three appendfsync settings are the actual durability dial:

always — fsync every write. Strongest, but throughput collapses to disk-sync speed; you lose the reason you picked Redis.
everysec — fsync once a second. The default and the sane choice: at most ~1s of writes at risk.
no — let the OS decide (often ~30s). Fast, and a data-loss footgun.

The sleeper problem in both RDB snapshots and AOF rewrites is fork(). Redis forks a child process to write the snapshot while the parent keeps serving traffic. The child shares the parent’s memory via copy-on-write. Under a write-heavy load, pages diverge as the parent mutates them, and memory usage can climb toward 2× the dataset size during the save. Size a box to exactly hold the dataset and it will OOM the moment a background save collides with a write spike.

flowchart TD
  S[BGSAVE\ntriggered] --> F[fork child\nshares pages COW]
  F --> W{parent keeps\nwriting?}
  W -->|heavy writes| D[pages diverge\nmemory climbs]
  W -->|light writes| OK[COW cheap\nsmall overhead]
  D --> MEM[RSS approaches\n2x dataset]
  MEM --> OOM[OOM killer\nor swap death]
  style OOM fill:#e11d48,color:#fff
  style MEM fill:#171717,color:#fff

How a value is stored and freed

Redis keys live in a single global hash table per database, with incremental rehashing so a resize never stalls the event loop for long. Each value carries a small object header, and expirations are tracked separately. TTL’d keys are reclaimed two ways: lazily when a client touches an expired key, and actively by a background cycle that samples a handful of keys ~10 times a second and deletes the expired ones. The active cycle is probabilistic, so a key with a TTL can sit in memory past its expiry until either someone reads it or the sampler happens to pick it. That matters when you size memory: expired-but-not-yet-collected keys still count against maxmemory.

The tradeoffs that bite

These are the decisions that look free at design time and bill you later.

Tradeoff	The free-looking choice	What it actually costs
Ack vs durability	Trusting “OK” means saved	Up to `appendfsync` interval of writes lost on crash
Memory vs save headroom	Sizing RAM to fit the dataset	OOM during COW fork under write load
Atomicity vs latency	A “convenient” Lua script	A 50ms script stalls every client 50ms
Replica reads vs freshness	Reading from a replica	Stale data from async lag; read-your-writes breaks
Convenience vs blast radius	`KEYS`, `FLUSHALL`, `DEBUG SLEEP`	One command = a self-inflicted outage
Single key vs sharding	Adding Cluster to “scale”	A hot key still lands on one shard; sharding can’t help

Two of these deserve emphasis. Atomicity vs latency: every Lua script and MULTI/EXEC block holds the single thread for its entire run. People write a “quick” script that loops over a collection and are shocked when p99 across unrelated keys jumps — the script was correct, it just monopolized the one core. Keep scripts short and bounded; if it iterates an unbounded structure, it is an outage with a delay timer on it.

Convenience vs blast radius: KEYS * walks the entire keyspace synchronously. On a few million keys that is a multi-second freeze for everyone. Use SCAN (cursor-based, incremental) instead, and rename or disable the dangerous commands in production with rename-command KEYS "".

Read and write performance

Redis is fast in the specific case it was built for and slow in the cases people accidentally create. Reads and writes share the same single thread, so “performance” is really a question of how long each command holds that thread.

What is fast: point operations on small values. GET, SET, INCR, HGET, single-member ZADD, EXPIRE — all O(1) or O(log n), all sub-millisecond on a warm instance. A single node comfortably does 100k–1M+ ops/sec depending on pipelining and value size.

What is slow (and dangerous): anything O(n) over a large collection. KEYS *, SMEMBERS on a million-element set, LRANGE mylist 0 -1, HGETALL on a giant hash, ZRANGE over a huge range, big DEL of a multi-million-element collection. Each holds the thread for its full duration. The DEL case is sneaky: deleting one key that points to a 5M-element set blocks the loop while it frees every element — use UNLINK instead, which reclaims memory in a background thread.

The levers that actually move the needle:

Pipelining. Batch N commands into one round trip. On a 1ms-RTT network, 100 sequential commands take ~100ms; pipelined, they take ~1ms plus execution. This is usually the single biggest win and costs nothing.
Pick the cheap data structure. Server-side ZRANGEBYSCORE beats fetching everything and filtering in the app, every time.
Avoid big keys. A “big key” (a multi-megabyte value or multi-million-element collection) makes every operation on it expensive and every migration a stall. Split it.
io-threads for connection-heavy loads. If you are network-bound on a high-core box, enabling I/O threads parallelizes socket work. It does nothing for command execution time.
Client-side caching (Redis 6+ tracking). For read-heavy hot keys, let the client cache and have Redis invalidate. Cuts both round trips and load on the single thread.

Watch the right metrics. Hit rate tells you about cache effectiveness; it tells you nothing about whether you are about to fall over. Watch latency (the built-in latency monitor), latest_fork_usec (fork pause time), instantaneous_ops_per_sec, mem_fragmentation_ratio, and slowlog (SLOWLOG GET). A creeping latest_fork_usec is an early warning that saves are getting expensive.

Failure modes

The recurring ones, in rough order of how often they page people. Each is symptom → root cause → prevention.

Memory cliff / eviction thrash. Symptom: latency spikes, hit rate craters, CPU climbs. Root cause: the working set exceeds maxmemory, so eviction runs on the hot path, evicting keys you are about to need, which forces recomputation, which adds writes, which triggers more eviction. The cache starts fighting itself. Prevention: alarm on used_memory / maxmemory > 0.8, size for the working set plus fork headroom, choose an eviction policy that matches usage (allkeys-lru for a pure cache, never silently relying on the default).

Slow-command stall. Symptom: p99 across all keys spikes simultaneously for no obvious reason. Root cause: one O(n) command (KEYS *, big HGETALL, an unbounded Lua loop) froze the event loop. Prevention: rename dangerous commands, use SCAN/UNLINK, keep scripts bounded, watch the slowlog.

Failover data loss. Symptom: writes that were acknowledged vanish after a promotion (the opening story). Root cause: async replication plus a failover landing in the un-replicated window. Prevention: understand the loss window, use WAIT where it helps, and never use Redis as the source of truth for data you can’t lose.

Fork-induced OOM or latency. Symptom: periodic memory and latency spikes correlated with save times. Root cause: COW divergence during BGSAVE/AOF-rewrite under write load. Prevention: leave ~30% RAM headroom, disable transparent huge pages (THP), schedule saves off-peak, watch latest_fork_usec.

Hot key. Symptom: one shard is pegged while the rest of the Cluster is idle. Root cause: a single key (a viral post’s view counter) concentrates all traffic, and sharding distributes keys, not load within a key. Prevention: client-side caching, request coalescing, or split the key (counter:{shard} and sum on read).

Connection storm. Symptom: ERR max number of clients reached, climbing memory, event-loop time spent on connection churn. Root cause: every client holds a connection; a deploy or pool reset reconnects thousands at once. Prevention: connection pooling, a proxy, and maxclients set deliberately.

If you set maxmemory-policy noeviction on something you treat as a cache, a full instance stops accepting writes entirely — it returns OOM command not allowed on every write. The failure mode flips from latency to hard write errors, and it does so the instant you cross the ceiling, not gradually. Know which policy you have configured before you hit the wall, because the wrong default turns a capacity problem into an outage.

Replication and the loss window in detail

Redis replication is asynchronous by default. The primary streams its command stream to replicas; a replica applies writes when it receives them and acknowledges nothing about durability back to the client.

sequenceDiagram
  participant App
  participant Primary
  participant Replica
  participant Sentinel
  App->>Primary: SET k v
  Primary-->>App: OK
  Primary->>Replica: async replicate
  Note over Primary,Replica: write exists\nonly on primary
  Primary--xReplica: primary dies here
  Sentinel->>Replica: promote replica
  Note over Replica: un-replicated\nwrite is gone

That gap is real, quantifiable data loss on failover, and its size equals your replication lag at the moment of failure. The WAIT numreplicas timeout command lets a client block until N replicas acknowledge a write, shrinking the window — but WAIT is not a consensus protocol. It does not prevent loss if all acked replicas also die, and it does not make Redis a system of record. Treat it as a knob that tightens the odds, not a guarantee. For anything that must survive failure, the durable truth lives in PostgreSQL or DynamoDB, and Redis holds a fast, rebuildable copy.

Scaling it

The honest progression, in order. Skipping steps is how people end up with a distributed system they didn’t need.

Vertical first. One Redis core goes a very long way. Before anything else, kill slow commands, right-size memory, enable pipelining, and add read replicas if the load is read-heavy. Most “we need Cluster” conversations end here once someone removes a KEYS * from a cron job.
Read replicas. Scale reads by routing read-only traffic to replicas. The cost is staleness: replicas serve data behind the primary by the replication lag, so any read-your-writes path must go to the primary. Replicas do nothing for write throughput.
Redis Cluster. Shards the keyspace across 16,384 hash slots spread over multiple primaries, each with its own replicas. This buys horizontal write scale. The price: multi-key operations must land in the same slot (force it with a hash tag, {user123}:profile and {user123}:settings share a slot), cross-slot transactions and scripts are gone, and resharding moves slots live while clients follow MOVED/ASK redirects.

flowchart TD
  K[key name] --> H[CRC16 mod\n16384]
  H --> SL[hash slot]
  SL --> A[Primary A\nslots 0-5460]
  SL --> B[Primary B\nslots 5461-10922]
  SL --> C[Primary C\nslots 10923-16383]
  A --> AR[(replica)]
  B --> BR[(replica)]
  C --> CR[(replica)]

The hot-key wall. Cluster distributes keys across slots, not load within a single key. A viral counter is one key, one slot, one primary — and no amount of sharding moves that. This is the wall that surprises teams who assumed Cluster meant “infinite scale.” The fixes are application-level: client-side caching, request coalescing (one fetch fans out to many waiters), or splitting the key into counter:{0..N} and summing on read. The slot mechanism is the same consistent hashing idea you see across distributed stores.
Connections. At thousands of clients, each connection costs memory and event-loop attention. Put a pool in the app and a proxy (Envoy, or historically twemproxy) in front. This is the same pressure that makes connection pooling mandatory in PostgreSQL, for a different underlying reason.

When to reach for it (and when not to)

Reach for Redis when you need sub-millisecond reads on hot data, atomic counters, ephemeral state (sessions, feature flags, rate-limit windows), leaderboards, request deduplication, pub/sub fan-out, or a fast queue at modest scale. It is the default answer for a cache in front of a slower system of record, and the default answer for rate limiting and short-lived coordination. The whole point is that the data is fast to access and cheap to lose, because you can rebuild it from the source of truth.

Don’t reach for it as the durable home for data you cannot afford to lose on an async failover — the opening story is the cautionary tale. Don’t use it as a high-scale durable message log; that is Kafka’s job and Streams will hurt you at volume. Don’t lean on Redis locks for true mutual exclusion across failures — use a fencing token from a real source of truth or a consensus system. And don’t let memory grow unbounded because “it’s just a cache”; a cache without an eviction policy is a time bomb.

When to consider alternatives

Durable source of truth with transactions → PostgreSQL or DynamoDB.
High-scale durable event log / streaming backbone → Kafka.
General message queue with delivery guarantees and retries → Message Queues or a task system like Celery.
Search, relevance, and full-text ranking → Elasticsearch.
Strong distributed coordination / leader election / locks that must be correct → ZooKeeper.
Massive-scale write-heavy storage with tunable consistency → Cassandra.

The pattern: Redis is the fast, lossy, in-memory layer. The moment the requirement becomes “and it must survive failure / scale writes durably / guarantee delivery,” the right tool is something purpose-built for durability, and Redis sits in front of it as the speed layer.

Operational checklist

Set maxmemory and an explicit maxmemory-policy. The default leaves a full cache returning hard write errors, not evicting.
Alert on used_memory / maxmemory > 0.8. The configured ceiling is invisible to host-level memory dashboards, so a node can be “fine” on the OS view while it’s about to evict everything.
Leave ~30% RAM headroom for the COW fork during saves, and disable transparent huge pages (THP) to keep fork latency sane.
Rename or disable dangerous commands in production: KEYS, FLUSHALL, FLUSHDB, DEBUG. Use SCAN and UNLINK instead of KEYS and big DEL.
Pick appendfsync everysec plus RDB for most stateful uses; reserve always for the rare case that justifies the throughput hit.
Watch latency, latest_fork_usec, SLOWLOG, mem_fragmentation_ratio, and replication offset lag — not just hit rate.
Keep Lua scripts and MULTI/EXEC blocks short and bounded; an unbounded loop is an outage on a timer.
Document your failover loss window (it equals replication lag) and use WAIT where tightening it is worth the latency.
Pool connections in the app and set maxclients deliberately; put a proxy in front at high connection counts.
In Cluster, design keys with hash tags so co-accessed keys share a slot, and have a hot-key mitigation plan before you launch anything viral.

Summary

Redis is the best in-memory data structure server there is, and almost all of its sharp edges trace back to the same four facts: it runs one command at a time on one core (so a slow command stalls everyone), it keeps data in RAM (so you fall off a cliff at maxmemory, not a slope), it acknowledges before it persists (so durability lags the OK by your appendfsync interval), and it replicates asynchronously (so failover loses the un-replicated tail). Treat it as a fast, rebuildable speed layer in front of a durable source of truth, set memory and eviction explicitly, keep commands and scripts cheap, leave headroom for the fork, and never trust it with data or locks you can’t afford to lose. Do that and Redis is the most pleasant dependency in your stack. Forget one of those four facts and it’s the one that pages you.

Appendix: caching patterns refresher

If the body assumed caching fundamentals, here is the quick version. The deeper treatment is in Caching Strategies.

Cache-aside (lazy loading) — the app checks Redis, and on a miss reads the source of truth and populates Redis. Simplest and most common; the risk is a thundering herd when a popular key expires and every request misses at once. Mitigate with request coalescing or jittered TTLs.
Write-through — write to Redis and the source of truth together, keeping the cache fresh at the cost of write latency.
Write-behind — write to Redis, flush to the source of truth asynchronously. Fast, but you’ve reintroduced the durability gap and risk losing writes on a crash.
TTL discipline — every cached key should have a TTL unless you have a deliberate invalidation strategy. Add jitter so a batch of keys written together doesn’t all expire in the same second.

The unifying idea: a cache is a bet that recomputing or re-fetching is more expensive than storing a copy, and that the copy going stale or vanishing is survivable. Redis is the engine; the pattern is yours to choose.

Incidents & deep-dives

Where this system breaks in production — and how it comes back.

Documenting next

🔒 The Memory Cliff: Eviction Thrash Under maxmemoryroadmap →
🔒 Replication Buffer Death Spiralroadmap →
🔒 Hot Key: One Shard, All The Trafficroadmap →