feat(cache): Leasable primitive to close the cache-aside read-after-write race#75
Conversation
…er-write race Under concurrency a reader that fetched a row from the database just before a concurrent purge could re-populate the cache with that now-stale row *after* the purge ran, so later reads served stale data. Purging alone cannot prevent this: the stale reader's write lands after the purge. Adds a Leasable capability. Each key carries a monotonic generation that advances on purge. A reader captures the generation before its DB read and presents it when saving; saveWithLease() persists only if the generation is unchanged (atomic CAS via a Redis Lua script), so a write racing with a purge is rejected rather than caching stale data. The hit path is unchanged — a present value can only be current, since purge atomically drops the value and bumps the generation in one step. Redis implements the capability; Sharding/Pool delegate; the Cache facade exposes getGeneration()/saveWithLease() with a graceful fallback to an unconditional save for non-leasable adapters, so existing behaviour is preserved for everyone who does not opt in. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Greptile SummaryThis PR introduces a
Confidence Score: 4/5The lease feature is correct for the primary purge-based invalidation path; the global flush path remains a gap where pre-flush generation tokens are accepted after flushDB wipes all markers. All major issues raised in the previous review were resolved: Pool/CircuitBreaker delegation is safe, hash-level purges now advance the generation via LUA_PURGE_FIELD, list() hides the internal field, and the reserved-field guards are consistent. The one open issue is that flush() calls flushDB() without bumping or preserving generations, so a reader who captured '0' before a flush can re-populate the cache after it. src/Cache/Adapter/Redis.php — flush() does not invalidate outstanding lease tokens. Important Files Changed
Reviews (7): Last reviewed commit: "fix(cache): advance the generation on fi..." | Re-trigger Greptile |
…op stale Lua docblock Addresses review on #75: - Pool::getGeneration()/saveWithLease() no longer blindly delegate; they guard with `instanceof Leasable` (mirroring Sharding) and fall back to '0' / unconditional save(), so a pool holding a non-Leasable adapter (Filesystem, Memory, Memcached, None, ...) no longer triggers a "Call to undefined method" fatal. Covered by a new PoolTest case (the pool there holds Filesystem). - Remove the stale `ARGV[4]=generation ttl seconds` line from the LUA_SAVE_WITH_LEASE docblock; the save script takes only ARGV[1..3]. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n, reserved key, decorator support Second review round on #75: - Generation marker no longer expires (was 1d TTL). A long-TTL reader whose marker expired before saveWithLease() could have a stale write accepted; the marker must outlive any reader, so it persists. - purge() returns the DEL result again (number of value keys removed), not the INCR, preserving the documented false-for-missing-key semantics. - Generation markers use a reserved key prefix (`_utopia_cache_gen:`) instead of a `:__gen` suffix, so they cannot collide with a real cache entry (a hash) and cause WRONGTYPE errors. - CircuitBreaker now implements Leasable and delegates to the wrapped adapter, so a CircuitBreaker-wrapped Redis keeps lease semantics instead of silently falling back to an unconditional save. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… committed one Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Third review round on #75: the persisted generation markers were counted by getSize() (which returns Redis DBSIZE), so a purge could leave getSize() non-zero after the value was gone, and purging absent keys inflated it. getSize() now subtracts the generation namespace (SCAN over the reserved prefix) so it reports only real cache entries. The generation bump stays unconditional on purge — a reader may have read a row the writer is now deleting before anything was cached, so the generation must advance even when DEL removed nothing (skipping it reintroduced a stale-write window, caught by the existing lease test). Covered by a new size test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…a sidecar key Alternative to the sidecar-key generation in #75, for comparison. The generation lives in a reserved field (`__utopia_gen__`) inside the value's own hash, so every lease op is single-key: - saveWithLease / purge are single-key Lua scripts → one shard → correct under Redis Cluster and multi-threaded backends (Dragonfly) with no cross-shard span. - No sidecar key → deletes the reserved-prefix collision handling and the getSize() prefix-scan exclusion entirely. Interface is unchanged (same getGeneration + saveWithLease). Trade-offs vs the sidecar approach: - getSize() can no longer cleanly exclude the generation: a purged key keeps a tiny gen-only hash until re-cached, so DBSIZE slightly over-counts purged-but-not-yet-recached keys (filtering would need an O(N) scan). - purge() does a small HLEN/HEXISTS dance to keep "true iff a value was removed". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
[spike] single-key lease: generation in-hash instead of sidecar key
…erations Review on #75 (in-hash generation): the generation lives in a reserved field (__utopia_gen__) inside the value's own hash, but callers can address any field via load/save/touch/purge. A caller writing under that field would overwrite the generation with a cache envelope; the next purge then read a non-numeric value, fell back through `tonumber(current) or 0`, and reset the generation — reviving stale lease tokens. Reject the reserved field on every public field path (load/save/saveWithLease/ touch/purge-with-hash) so it can't be read, written, or deleted by callers, and can only be managed by the lease scripts. Covered by a regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
list() (hKeys) would surface the internal __utopia_gen__ field as a listable cache field, leaking the lease internals to callers. Filter it out so the reserved field is fully invisible to the public API. Covered by a test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Review on #75: purge($key, $hash) only ran HDEL on the field and left the per-key generation untouched, so the lease was bypassed for hashed entries — a reader could capture the generation, a concurrent field-purge would delete the field without advancing the generation, and the reader's saveWithLease(..., $hash, capturedGeneration) would still pass the CAS and re-cache stale data. Field-level purge now deletes the field and advances the generation in one single-key Lua step (LUA_PURGE_FIELD), mirroring the full-purge bump, while preserving purge()'s deletion-result return. Over-invalidating sibling fields of the same key is safe: it only costs cache misses, never staleness. Covered by a regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Problem
Under concurrency, a reader that fetched a row from the database just before a concurrent
purge()can re-populate the cache with that now-stale row after the purge runs. Subsequent reads then serve stale data until the next write. Purging alone can't prevent it — the stale reader'ssave()lands after the purge.This surfaced downstream as an intermittent read-after-write failure (e.g. a just-disabled project service still appearing enabled ~1.7% of the time under load).
Fix: a
LeasablecapabilityEach key carries a monotonic generation that advances on
purge(). A reader captures the generation before its DB read and presents it when saving:getGeneration(key)— current generation ("0"when none recorded).saveWithLease(key, data, hash, generation)— persists only if the key's generation is unchanged (atomic CAS via a Redis Lua script). A write racing with a purge is rejected instead of caching stale data.purge(key)— atomically drops the value and bumps the generation (single Lua script), so an in-flight reader's lease is invalidated.The hit path is unchanged: a present value can only be current, because purge atomically removes it and advances the generation in one step.
Redisimplements the capability;Sharding/Pooldelegate; theCachefacade exposes the two methods with a graceful fallback to an unconditionalsave()for non-leasable adapters — existing behaviour is preserved for anyone who doesn't opt in. Consumers (e.g.utopia-php/databasegetDocument()) opt in by capturing the generation before the DB read.Tests
tests/Cache/E2E/Redis/LeasableTest.php— deterministic regression covering generation default, CAS save, purge-advances-generation, the stale-save-after-purge rejection (the core guarantee), and fresh-save-after-purge.Verification
getDocument().🤖 Generated with Claude Code