Current behavior
SemanticCache.check (redisvl/extensions/cache/llm/semantic.py:432) refreshes the TTL of every matched entry on every hit:
# Refresh TTL on all found keys
for key in redis_keys:
self.expire(key)
This is a sliding-window TTL: a popular entry effectively lives as long as it keeps getting hit. There is no constructor flag, per-call argument, or per-entry attribute to disable this.
Why this is a problem
Sliding window is the wrong default in any domain where the popularity of an entry correlates with the cost of it being stale:
- a procedure superseded by a regulatory bulletin that the cache has not yet invalidated
- a balance-patch-affected gameplay answer in a live-ops surface
- a pricing or eligibility answer that changed at midnight
In these domains the most-asked question is the most likely to be cached and the most likely to fan out a stale answer. The standard mitigation today — store an absolute expires_at numeric and filter on it at read time — works but requires every customer to discover the pattern independently.
Proposed API
SemanticCache(
name="cache",
redis_url=...,
refresh_ttl_on_hit=True, # current behavior, default unchanged
)
cache.check(
prompt=...,
refresh_ttl=None, # per-call override
)
refresh_ttl_on_hit=False would make check() a pure read. Customers wanting absolute expiry would set this to False, store a numeric expires_at filterable field, and filter on it.
Backwards compatibility
Default remains True. Existing behavior unchanged. Opt-in behavioral change for customers who need it.
Notes
Surfaced while writing a scoped semantic caching architecture spec; came up repeatedly when stress-testing the design against regulated and fast-mutating domains where sliding-window TTL is actively hostile.
Current behavior
SemanticCache.check(redisvl/extensions/cache/llm/semantic.py:432) refreshes the TTL of every matched entry on every hit:This is a sliding-window TTL: a popular entry effectively lives as long as it keeps getting hit. There is no constructor flag, per-call argument, or per-entry attribute to disable this.
Why this is a problem
Sliding window is the wrong default in any domain where the popularity of an entry correlates with the cost of it being stale:
In these domains the most-asked question is the most likely to be cached and the most likely to fan out a stale answer. The standard mitigation today — store an absolute
expires_atnumeric and filter on it at read time — works but requires every customer to discover the pattern independently.Proposed API
refresh_ttl_on_hit=Falsewould makecheck()a pure read. Customers wanting absolute expiry would set this toFalse, store a numericexpires_atfilterable field, and filter on it.Backwards compatibility
Default remains
True. Existing behavior unchanged. Opt-in behavioral change for customers who need it.Notes
Surfaced while writing a scoped semantic caching architecture spec; came up repeatedly when stress-testing the design against regulated and fast-mutating domains where sliding-window TTL is actively hostile.