Skip to content

fix(posixage/flock): close ghost-lock race via inode verify + heartbeat#529

Merged
Benehiko merged 5 commits into
mainfrom
fix/posixage-flock-stale-recovery-race
May 28, 2026
Merged

fix(posixage/flock): close ghost-lock race via inode verify + heartbeat#529
Benehiko merged 5 commits into
mainfrom
fix/posixage-flock-stale-recovery-race

Conversation

@Benehiko
Copy link
Copy Markdown
Member

@Benehiko Benehiko commented May 27, 2026

Closes #530.

Summary

The posixage flock layer had a race where two callers could simultaneously believe they held the exclusive lock. Stale-recovery unlinked the lock file while another process still held flock(2) on the now-orphaned inode, then created a fresh inode at the same path and flocked that.

The race has two halves:

  • Acquirer side — a caller mid-acquisition can have its fd flock the inode that recovery just unlinked, while the path now points to a fresh inode held by someone else.
  • Holder side — a caller already holding the lock can have its critical section exceed the 30s stale threshold, at which point another process classifies it as abandoned and hijacks it.

This PR closes both halves and fixes a TOCTOU in recoverStaleLock between its Stat and Remove calls. There are now two commits:

  1. Inode verification on acquire — after a successful flock, verify (via os.SameFile) that the locked descriptor still refers to the file at the lock-file path. If not, drop the bad lock and retry. This closes the acquirer-side half and the recovery TOCTOU.
  2. Modtime heartbeat for holders — a background goroutine started by tryLock re-truncates the lock file every 10s and re-verifies the locked inode each tick. This closes the holder-side half by keeping the lock file young enough that concurrent recovery callers never see it as stale.

The bug

recoverStaleLock deletes the lock file based purely on its modification time being >30s old:

  1. Process A holds flock on inode₁ (legitimate, slow operation).
  2. Process B sees a stale modtime, calls root.Remove(lockFileName). On Unix this unlinks the dirent but keeps the inode alive for any open fd — A's flock is still kernel-valid.
  3. Process B opens the path again (gets a brand-new inode₂), flocks it, and returns success.
  4. A and B now both run inside what they think is an exclusive section.

The 30s threshold was always a kludge: flock is automatically released by the kernel on process death, so a held flock by definition belongs to a live holder. Removing the file does nothing to that holder; it only creates the inode-swap hazard.

A secondary bug lived in the recovery TOCTOU: two callers reaching recoverStaleLock concurrently both saw a stale modtime, both called Remove, the loser got ENOENT, and tryLock surfaced that as a hard lock-acquisition failure even though recovery had succeeded.

Fix part 1 — inode verification on acquire

acquireOnce(root, exclusive) runs the full open → flock → truncate → verify sequence atomically:

fl, _ := openFile(root)
if err := lockFile(fl.Fd(), exclusive); err != nil {
    fl.Close(); return nil, err
}
fl.Truncate(0)                              // refresh modtime
same, err := isCurrentLockFile(fl, root)    // os.SameFile(fl.Stat(), root.Stat(lockFileName))
if err != nil || !same {
    releaseLock(fl); fl.Close()
    return nil, errStaleInode               // or err
}
return fl, nil

isCurrentLockFile returns false when the path resolves to a different inode than the locked fd, or when the path no longer exists. Either condition means a concurrent recovery unlinked the file between our open and our flock — we drop the bad lock and let the retry loop start over against whatever is now at the path. retryLock now loops acquireOnce instead of retrying flock against a stale fd, so each retry gets a fresh descriptor.

recoverStaleLock lost its *os.File parameter (it now works purely by path) and treats ENOENT on both Stat and Remove as success — closing the TOCTOU window without a mutex.

To make the release-without-close step possible, releaseLock was split out from unlockFile on both Unix and Windows. unlockFile still releases + closes, so external callers' contracts are unchanged.

Fix part 2 — modtime heartbeat for holders

tryLock now starts a goroutine on successful acquisition that re-truncates the lock file every heartbeatInterval (10s) until unlock. Each tick also re-verifies the locked descriptor's inode against the path; a mismatch indicates the heartbeat itself was starved past staleThreshold and the lock has been hijacked, which is logged via log/slog.

The UnlockFunc returned to the caller cancels the goroutine's context and waits for it to exit before closing the fd, so the heartbeat can never operate on a closed descriptor.

heartbeatInterval (10s) and staleThreshold (30s) are package-private vars rather than consts so tests can shorten them. Production callers cannot tune them; the 3× safety ratio is fixed.

Hijack detection without in-band failure

When the heartbeat detects an inode mismatch, the holder's outstanding operation has no way to be cancelled in-band — UnlockFunc is the only callback we hold. The mismatch is therefore logged but the holder keeps running. This is a deliberate trade-off: the heartbeat fires every 10s vs. a 30s threshold, so on a non-pathological system the hijack window is effectively closed. The log line is the safety net for SIGSTOP / extreme GC pause / scheduler starvation.

Test plan

  • go test ./store/posixage/... -race -count=1 passes
  • go vet ./store/posixage/... clean
  • golangci-lint run ./store/posixage/... reports 0 issues
  • flock_race_test.go covers:
    • isCurrentLockFile correctly identifies an unlinked path
    • isCurrentLockFile correctly identifies inode replacement
    • Two concurrent recoverStaleLock calls both return success (TOCTOU fix)
    • Heartbeat keeps a long-running holder live (recovery refuses to fire after several stale-thresholds with the lock held)
  • Existing TestFlock and TestRecoverLock suites still pass

What was dropped during review

  • Two t.Skip() placeholder tests that documented the now-fixed holder-side limitations.
  • A "two waiters cannot both acquire after recovery" test that used a live raw-flock holder. With a live holder, flock on the unlinked orphan always returns EWOULDBLOCK, so the test never exercised the ghost-lock path it claimed to cover. It gave false confidence and is removed rather than left as theatre.

🤖 Generated with Claude Code

When a stale lock file is recovered, the unlinking process leaves any
existing flock holder pointing at an orphaned inode while the next
caller flocks a fresh inode created at the same path. Both callers then
believe they hold exclusive access. recoverStaleLock also lost the race
between its Stat and Remove calls, surfacing a spurious lock-acquisition
failure when another caller had already unlinked the file.

acquireOnce now opens, flocks, compares the locked fd's inode to the
inode currently at the lock-file path (via os.SameFile), and only then
truncates to refresh modtime. On mismatch it releases the lock and
returns errStaleInode so the retry loop picks up a fresh fd against
whatever file is now at the path. retryLock loops acquireOnce instead of
retrying flock against a stale fd. recoverStaleLock is now path-based
and treats ENOENT on Stat or Remove as success, closing the
stat/remove TOCTOU window.

This does not protect a long-running holder whose modtime ages past 30s
without a heartbeat — that race is documented in skipped tests and would
need a holder-side modtime refresh to fully close.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@docker-agent docker-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🔴 CRITICAL

Note: Verifier returned an empty response (anti-loop rule applied). Findings below are from the drafter, manually reviewed against source. High-confidence issues only are reported.

This PR correctly fixes the core ghost-lock race on the acquiring side using /. However, there is a residual TOCTOU window in itself and a silently swallowed truncate error that can re-expose a valid lock to stale-recovery removal.

Comment thread store/posixage/internal/flock/flock.go Outdated
Comment thread store/posixage/internal/flock/flock.go Outdated
Close the holder-side half of the ghost-lock race. The prior commit
verified the locked inode at acquisition time, which protected new
acquirers but left long-running holders vulnerable: any caller whose
critical section exceeded the 30s stale threshold would be hijacked
when another process unlinked the lock file and created a fresh
inode at the same path.

A goroutine started by tryLock now re-truncates the lock file every
10s for the duration of the hold, keeping its modtime young enough
that concurrent recovery callers never misclassify it as abandoned.
Each tick also re-verifies that the fd's inode still matches the
path; a mismatch indicates the heartbeat itself was starved past the
stale threshold and we have been hijacked, which is logged via
log/slog. The UnlockFunc stops the goroutine and waits for it to
exit before closing the fd, so the heartbeat can never operate on a
closed descriptor.

Also swap the order of truncate and inode verification in
acquireOnce so the modtime refresh is committed before the inode
check, narrowing the window between a passing check and a concurrent
unlink.

Tests:
- Drop both t.Skip() placeholder tests; the limitations they
  documented are now fixed by the heartbeat.
- Drop the "two waiters cannot both acquire after recovery" test;
  with a live raw-flock holder it never exercised the ghost-lock
  code path, so it gave false confidence.
- Add a heartbeat smoke test that shortens both intervals via the
  package-private vars and verifies recoverStaleLock refuses to
  recover a heartbeating holder after several stale-thresholds of
  wall time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Benehiko Benehiko changed the title fix(posixage/flock): verify locked inode to prevent ghost-lock race fix(posixage/flock): close ghost-lock race via inode verify + heartbeat May 27, 2026
Benehiko and others added 2 commits May 28, 2026 09:57
Document that the returned UnlockFunc owns both the file descriptor and
the background heartbeat goroutine, and that callers must invoke it
exactly once (typically via defer) to avoid leaking either. Add idiomatic
examples on TryLock and TryRLock, and note the heartbeat-stop-before-
close ordering on UnlockFunc itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace log/slog with the project's logging.Logger interface so the
library no longer writes to a default sink. The heartbeat goroutine
resolves its logger via logging.FromContext, falling back to a noop
when the caller hasn't set one. posixage's fileStore plumbs its
configured logger through with logging.WithLogger so the existing
WithLogger option on store.New propagates transparently without any
new parameters on TryLock/TryRLock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joe0BAB
joe0BAB previously approved these changes May 28, 2026
Signed-off-by: Alano Terblanche <18033717+Benehiko@users.noreply.github.com>
@Benehiko Benehiko merged commit 26d7096 into main May 28, 2026
18 checks passed
@Benehiko Benehiko deleted the fix/posixage-flock-stale-recovery-race branch May 28, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

posixage/flock: ghost-lock race when stale-recovery unlinks a still-flocked file

2 participants