Skip to content

feat(cache): Leasable primitive to close the cache-aside read-after-write race#75

Merged
abnegate merged 10 commits into
mainfrom
feat/leasable-cache
Jun 23, 2026
Merged

feat(cache): Leasable primitive to close the cache-aside read-after-write race#75
abnegate merged 10 commits into
mainfrom
feat/leasable-cache

Conversation

@abnegate

Copy link
Copy Markdown
Member

Problem

Under concurrency, a reader that fetched a row from the database just before a concurrent purge() can re-populate the cache with that now-stale row after the purge runs. Subsequent reads then serve stale data until the next write. Purging alone can't prevent it — the stale reader's save() lands after the purge.

This surfaced downstream as an intermittent read-after-write failure (e.g. a just-disabled project service still appearing enabled ~1.7% of the time under load).

Fix: a Leasable capability

Each key carries a monotonic generation that advances on purge(). A reader captures the generation before its DB read and presents it when saving:

  • getGeneration(key) — current generation ("0" when none recorded).
  • saveWithLease(key, data, hash, generation) — persists only if the key's generation is unchanged (atomic CAS via a Redis Lua script). A write racing with a purge is rejected instead of caching stale data.
  • purge(key) — atomically drops the value and bumps the generation (single Lua script), so an in-flight reader's lease is invalidated.

The hit path is unchanged: a present value can only be current, because purge atomically removes it and advances the generation in one step.

Redis implements the capability; Sharding/Pool delegate; the Cache facade exposes the two methods with a graceful fallback to an unconditional save() for non-leasable adapters — existing behaviour is preserved for anyone who doesn't opt in. Consumers (e.g. utopia-php/database getDocument()) opt in by capturing the generation before the DB read.

Tests

tests/Cache/E2E/Redis/LeasableTest.php — deterministic regression covering generation default, CAS save, purge-advances-generation, the stale-save-after-purge rejection (the core guarantee), and fresh-save-after-purge.

Verification

  • Pint: clean. PHPStan (level max): no errors. New tests: 5/5 pass.
  • End-to-end against appwrite: a service-disable read-after-write that flaked 20/1200 (~1.7%) under 64 concurrent readers drops to 0/1200 with the lease wired into getDocument().

🤖 Generated with Claude Code

…er-write race

Under concurrency a reader that fetched a row from the database just before a
concurrent purge could re-populate the cache with that now-stale row *after*
the purge ran, so later reads served stale data. Purging alone cannot prevent
this: the stale reader's write lands after the purge.

Adds a Leasable capability. Each key carries a monotonic generation that
advances on purge. A reader captures the generation before its DB read and
presents it when saving; saveWithLease() persists only if the generation is
unchanged (atomic CAS via a Redis Lua script), so a write racing with a purge
is rejected rather than caching stale data. The hit path is unchanged — a
present value can only be current, since purge atomically drops the value and
bumps the generation in one step.

Redis implements the capability; Sharding/Pool delegate; the Cache facade
exposes getGeneration()/saveWithLease() with a graceful fallback to an
unconditional save for non-leasable adapters, so existing behaviour is
preserved for everyone who does not opt in.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 22, 2026 16:02

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@greptile-apps

greptile-apps Bot commented Jun 22, 2026

Copy link
Copy Markdown

Greptile Summary

This PR introduces a Leasable capability to close the cache-aside read-after-write race: each key carries a monotonic generation that is captured by a reader before its DB read and compared atomically on write, so a stale post-purge save is rejected instead of repopulating the cache. The implementation is well-structured and resolves all the major correctness issues flagged in the previous review round.

  • Redis stores the generation in-band as a reserved hash field (__utopia_gen__), eliminating sidecar-key collisions; all three Lua scripts (LUA_SAVE_WITH_LEASE, LUA_PURGE_BUMP, LUA_PURGE_FIELD) are single-key and therefore shard-safe; all public API methods guard the reserved field.
  • CircuitBreaker, Pool, and Sharding each implement Leasable with correct instanceof guards and safe fallbacks to save() for non-Leasable inner adapters; list() filters the internal generation field.
  • The flush() path (noted in the prior review) still calls flushDB() without advancing or preserving any generation state, so a reader who captured '0' before a global flush can write stale data after it — that gap remains open.

Confidence Score: 4/5

The lease feature is correct for the primary purge-based invalidation path; the global flush path remains a gap where pre-flush generation tokens are accepted after flushDB wipes all markers.

All major issues raised in the previous review were resolved: Pool/CircuitBreaker delegation is safe, hash-level purges now advance the generation via LUA_PURGE_FIELD, list() hides the internal field, and the reserved-field guards are consistent. The one open issue is that flush() calls flushDB() without bumping or preserving generations, so a reader who captured '0' before a flush can re-populate the cache after it.

src/Cache/Adapter/Redis.php — flush() does not invalidate outstanding lease tokens.

Important Files Changed

Filename Overview
src/Cache/Feature/Leasable.php New interface defining getGeneration() and saveWithLease(); well-documented contract with no issues.
src/Cache/Adapter/Redis.php Core lease implementation using single-key Lua scripts (LUA_SAVE_WITH_LEASE, LUA_PURGE_BUMP, LUA_PURGE_FIELD). Generation stored in-hash; reserved-field guards protect GENERATION_FIELD; list() filters internal field. Hash-level purge now correctly bumps generation. flush() still does not invalidate outstanding leases (pre-existing unfixed P1 from prior review).
src/Cache/Adapter/CircuitBreaker.php Now implements Leasable; both new methods correctly guard on inner adapter instanceof Leasable and delegate through the circuit breaker, fixing the previously reported wrappers-disable-leases issue.
src/Cache/Adapter/Pool.php Implements Leasable using pool->use() with instanceof guard, correctly falling back to save() for non-Leasable adapters; fixes the previously reported unsafe delegation fatal.
src/Cache/Adapter/Sharding.php Implements Leasable with per-shard instanceof guard on both methods; clean and consistent with Pool.
src/Cache/Cache.php Adds getGeneration() and saveWithLease() facades with Leasable check; correctly falls back to unconditional save() for non-Leasable adapters; case-sensitivity normalization applied consistently.
tests/Cache/E2E/Redis/LeasableTest.php New E2E test suite covering generation defaults, CAS save, purge semantics, hash-level purge, stale-save rejection, list() hiding, and reserved-field protection; all assertions look correct.
tests/Cache/E2E/PoolTest.php Adds fallback test for non-Leasable Pool adapter; verifies no fatal and correct degraded behavior.

Reviews (7): Last reviewed commit: "fix(cache): advance the generation on fi..." | Re-trigger Greptile

Comment thread src/Cache/Adapter/Pool.php
Comment thread src/Cache/Adapter/Redis.php
…op stale Lua docblock

Addresses review on #75:
- Pool::getGeneration()/saveWithLease() no longer blindly delegate; they guard
  with `instanceof Leasable` (mirroring Sharding) and fall back to '0' /
  unconditional save(), so a pool holding a non-Leasable adapter (Filesystem,
  Memory, Memcached, None, ...) no longer triggers a "Call to undefined method"
  fatal. Covered by a new PoolTest case (the pool there holds Filesystem).
- Remove the stale `ARGV[4]=generation ttl seconds` line from the
  LUA_SAVE_WITH_LEASE docblock; the save script takes only ARGV[1..3].

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/Cache/Adapter/Redis.php Outdated
Comment thread src/Cache/Adapter/Redis.php
Comment thread src/Cache/Adapter/Redis.php Outdated
Comment thread src/Cache/Cache.php
abnegate and others added 2 commits June 23, 2026 17:39
…n, reserved key, decorator support

Second review round on #75:
- Generation marker no longer expires (was 1d TTL). A long-TTL reader whose
  marker expired before saveWithLease() could have a stale write accepted; the
  marker must outlive any reader, so it persists.
- purge() returns the DEL result again (number of value keys removed), not the
  INCR, preserving the documented false-for-missing-key semantics.
- Generation markers use a reserved key prefix (`_utopia_cache_gen:`) instead of
  a `:__gen` suffix, so they cannot collide with a real cache entry (a hash) and
  cause WRONGTYPE errors.
- CircuitBreaker now implements Leasable and delegates to the wrapped adapter,
  so a CircuitBreaker-wrapped Redis keeps lease semantics instead of silently
  falling back to an unconditional save.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… committed one

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/Cache/Adapter/Redis.php
Third review round on #75: the persisted generation markers were counted by
getSize() (which returns Redis DBSIZE), so a purge could leave getSize()
non-zero after the value was gone, and purging absent keys inflated it.

getSize() now subtracts the generation namespace (SCAN over the reserved
prefix) so it reports only real cache entries. The generation bump stays
unconditional on purge — a reader may have read a row the writer is now
deleting before anything was cached, so the generation must advance even when
DEL removed nothing (skipping it reintroduced a stale-write window, caught by
the existing lease test). Covered by a new size test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/Cache/Adapter/Redis.php Outdated
…a sidecar key

Alternative to the sidecar-key generation in #75, for comparison. The generation
lives in a reserved field (`__utopia_gen__`) inside the value's own hash, so
every lease op is single-key:
- saveWithLease / purge are single-key Lua scripts → one shard → correct under
  Redis Cluster and multi-threaded backends (Dragonfly) with no cross-shard span.
- No sidecar key → deletes the reserved-prefix collision handling and the
  getSize() prefix-scan exclusion entirely.

Interface is unchanged (same getGeneration + saveWithLease).

Trade-offs vs the sidecar approach:
- getSize() can no longer cleanly exclude the generation: a purged key keeps a
  tiny gen-only hash until re-cached, so DBSIZE slightly over-counts
  purged-but-not-yet-recached keys (filtering would need an O(N) scan).
- purge() does a small HLEN/HEXISTS dance to keep "true iff a value was removed".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
[spike] single-key lease: generation in-hash instead of sidecar key
Comment thread src/Cache/Adapter/Redis.php
abnegate and others added 2 commits June 23, 2026 22:27
…erations

Review on #75 (in-hash generation): the generation lives in a reserved field
(__utopia_gen__) inside the value's own hash, but callers can address any field
via load/save/touch/purge. A caller writing under that field would overwrite the
generation with a cache envelope; the next purge then read a non-numeric value,
fell back through `tonumber(current) or 0`, and reset the generation — reviving
stale lease tokens.

Reject the reserved field on every public field path (load/save/saveWithLease/
touch/purge-with-hash) so it can't be read, written, or deleted by callers, and
can only be managed by the lease scripts. Covered by a regression test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
list() (hKeys) would surface the internal __utopia_gen__ field as a listable
cache field, leaking the lease internals to callers. Filter it out so the
reserved field is fully invisible to the public API. Covered by a test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/Cache/Adapter/Redis.php Outdated
Comment thread src/Cache/Adapter/Redis.php
Review on #75: purge($key, $hash) only ran HDEL on the field and left the
per-key generation untouched, so the lease was bypassed for hashed entries — a
reader could capture the generation, a concurrent field-purge would delete the
field without advancing the generation, and the reader's saveWithLease(...,
$hash, capturedGeneration) would still pass the CAS and re-cache stale data.

Field-level purge now deletes the field and advances the generation in one
single-key Lua step (LUA_PURGE_FIELD), mirroring the full-purge bump, while
preserving purge()'s deletion-result return. Over-invalidating sibling fields of
the same key is safe: it only costs cache misses, never staleness. Covered by a
regression test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@abnegate abnegate merged commit d504840 into main Jun 23, 2026
6 checks passed
@abnegate abnegate deleted the feat/leasable-cache branch June 23, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants