Summary
store/posixage/internal/flock can hand out the same exclusive lock to two callers simultaneously. The bug lives in the stale-recovery path: recoverStaleLock unlinks the lock file when its modtime ages past 30s, but on Unix unlink does not invalidate any open fd's flock(2) on the now-orphaned inode. A second caller can create a fresh inode at the same path, flock that, and walk away believing it owns the lock — while the original holder still legitimately holds a kernel-valid flock on the orphan.
The race has two halves. Either is sufficient on its own.
Half 1 — acquirer-side
A caller in the middle of tryLock can lock the orphan inode that recovery just removed:
- Stale lock file exists at the path on inode A. No live
flock holder (e.g. previous holder crashed; or holder is sitting on its own flock without refreshing modtime).
- Caller X calls
openFile → fd₁ → A.
- Caller Y calls
recoverStaleLock → stat, sees stale → Remove(lockFileName) unlinks A. The path is now empty; A still exists in the kernel via fd₁.
- X calls
flock(fd₁) → SUCCESS (orphaned inode, no holders).
- Y calls
openFile again → creates inode B at the path → flock(fd₂) → SUCCESS.
- X and Y both believe they hold the exclusive lock. A race-free critical section is no longer race-free.
Half 2 — holder-side
A caller that already legitimately holds the lock can be hijacked if its critical section exceeds the 30s threshold:
- Caller X acquires the lock cleanly on inode A. Truncate sets modtime to T₀.
- X does work for >30s without refreshing the modtime (the package never re-touched the file during a hold).
- Caller Y calls
tryLock, fails initial acquireOnce, runs recoverStaleLock, sees time.Since(modtime) > 30s → unlinks A.
- Y then calls
openFile → creates inode B → flock(fd_B) → SUCCESS.
- X is still happily inside its critical section on A, Y is now inside its critical section on B. Both think they have it.
Half 3 — recovery TOCTOU (minor)
Two callers reaching recoverStaleLock concurrently both saw a stale modtime, both called Remove, the loser got ENOENT, and tryLock surfaced that as a hard lock-acquisition failure even though recovery had succeeded. Not a correctness bug on its own, but conflates a benign race with a real failure.
Impact
Anything calling flock.TryLock / flock.TryRLock from store/posixage/store.go. Read/write paths that assume exclusive access (the whole point of the lock) can interleave. Symptoms in production would be intermittent corruption of the on-disk secret store under contention with a slow operation or after a process crashed mid-hold.
Why the 30s threshold was always a kludge
flock is kernel-released on process death. A held flock therefore by definition belongs to a live process — there is no need to "recover" from a stale flock, only from a stale file. Once you accept that, the only thing the modtime check tells you is "the path's modtime is old," which says nothing about whether the inode the path points to is locked. Removing the file does nothing to a live holder's flock; it only creates the inode-swap hazard above.
Why the race is hard to reproduce
- Acquirer-side needs
unlink to land between another caller's open and flock. Microseconds of window per attempt.
- Holder-side needs a critical section that exceeds 30s and a concurrent acquirer firing recovery while the holder is mid-work. Easier to construct in a stress test than to observe in normal use.
The race tests in the fix PR cover the building blocks (isCurrentLockFile correctness, concurrent recovery returning success) and a heartbeat smoke test (recovery refuses to fire against a heartbeating holder). A deterministic end-to-end ghost-lock reproducer would require test hooks to delay flock after open — not added.
Affected code
store/posixage/internal/flock/flock.go
store/posixage/internal/flock/recover.go
store/posixage/internal/flock/flock_unix.go (release/unlock split)
store/posixage/internal/flock/flock_windows.go (release/unlock split)
Fix
PR #529 — closes both halves:
- Verify the locked inode matches the path inode after every
flock (acquirer-side).
- Run a background goroutine while the lock is held that re-truncates the lock file every 10s and re-verifies the inode each tick (holder-side).
- Treat
ENOENT as success on both Stat and Remove inside recoverStaleLock (TOCTOU).
Heartbeat starvation past 30s (e.g. SIGSTOP, extreme scheduler pressure) is logged via log/slog when detected at the next tick. It cannot fail the caller's outstanding operation in-band — UnlockFunc is the only callback we hold — so the log line is the safety net for that pathological case.
Summary
store/posixage/internal/flockcan hand out the same exclusive lock to two callers simultaneously. The bug lives in the stale-recovery path:recoverStaleLockunlinks the lock file when its modtime ages past 30s, but on Unix unlink does not invalidate any open fd'sflock(2)on the now-orphaned inode. A second caller can create a fresh inode at the same path, flock that, and walk away believing it owns the lock — while the original holder still legitimately holds a kernel-valid flock on the orphan.The race has two halves. Either is sufficient on its own.
Half 1 — acquirer-side
A caller in the middle of
tryLockcan lock the orphan inode that recovery just removed:flockholder (e.g. previous holder crashed; or holder is sitting on its own flock without refreshing modtime).openFile→ fd₁ → A.recoverStaleLock→stat, sees stale →Remove(lockFileName)unlinks A. The path is now empty; A still exists in the kernel via fd₁.flock(fd₁)→ SUCCESS (orphaned inode, no holders).openFileagain → creates inode B at the path →flock(fd₂)→ SUCCESS.Half 2 — holder-side
A caller that already legitimately holds the lock can be hijacked if its critical section exceeds the 30s threshold:
tryLock, fails initialacquireOnce, runsrecoverStaleLock, seestime.Since(modtime) > 30s→ unlinks A.openFile→ creates inode B →flock(fd_B)→ SUCCESS.Half 3 — recovery TOCTOU (minor)
Two callers reaching
recoverStaleLockconcurrently both saw a stale modtime, both calledRemove, the loser gotENOENT, andtryLocksurfaced that as a hard lock-acquisition failure even though recovery had succeeded. Not a correctness bug on its own, but conflates a benign race with a real failure.Impact
Anything calling
flock.TryLock/flock.TryRLockfromstore/posixage/store.go. Read/write paths that assume exclusive access (the whole point of the lock) can interleave. Symptoms in production would be intermittent corruption of the on-disk secret store under contention with a slow operation or after a process crashed mid-hold.Why the 30s threshold was always a kludge
flockis kernel-released on process death. A heldflocktherefore by definition belongs to a live process — there is no need to "recover" from a stale flock, only from a stale file. Once you accept that, the only thing the modtime check tells you is "the path's modtime is old," which says nothing about whether the inode the path points to is locked. Removing the file does nothing to a live holder's flock; it only creates the inode-swap hazard above.Why the race is hard to reproduce
unlinkto land between another caller'sopenandflock. Microseconds of window per attempt.The race tests in the fix PR cover the building blocks (
isCurrentLockFilecorrectness, concurrent recovery returning success) and a heartbeat smoke test (recovery refuses to fire against a heartbeating holder). A deterministic end-to-end ghost-lock reproducer would require test hooks to delayflockafteropen— not added.Affected code
store/posixage/internal/flock/flock.gostore/posixage/internal/flock/recover.gostore/posixage/internal/flock/flock_unix.go(release/unlock split)store/posixage/internal/flock/flock_windows.go(release/unlock split)Fix
PR #529 — closes both halves:
flock(acquirer-side).ENOENTas success on bothStatandRemoveinsiderecoverStaleLock(TOCTOU).Heartbeat starvation past 30s (e.g. SIGSTOP, extreme scheduler pressure) is logged via
log/slogwhen detected at the next tick. It cannot fail the caller's outstanding operation in-band —UnlockFuncis the only callback we hold — so the log line is the safety net for that pathological case.