fix(posixage/flock): close ghost-lock race via inode verify + heartbeat by Benehiko · Pull Request #529 · docker/secrets-engine

Benehiko · 2026-05-27T12:00:24Z

Closes #530.

Summary

The posixage flock layer had a race where two callers could simultaneously believe they held the exclusive lock. Stale-recovery unlinked the lock file while another process still held flock(2) on the now-orphaned inode, then created a fresh inode at the same path and flocked that.

The race has two halves:

Acquirer side — a caller mid-acquisition can have its fd flock the inode that recovery just unlinked, while the path now points to a fresh inode held by someone else.
Holder side — a caller already holding the lock can have its critical section exceed the 30s stale threshold, at which point another process classifies it as abandoned and hijacks it.

This PR closes both halves and fixes a TOCTOU in recoverStaleLock between its Stat and Remove calls. There are now two commits:

Inode verification on acquire — after a successful flock, verify (via os.SameFile) that the locked descriptor still refers to the file at the lock-file path. If not, drop the bad lock and retry. This closes the acquirer-side half and the recovery TOCTOU.
Modtime heartbeat for holders — a background goroutine started by tryLock re-truncates the lock file every 10s and re-verifies the locked inode each tick. This closes the holder-side half by keeping the lock file young enough that concurrent recovery callers never see it as stale.

The bug

recoverStaleLock deletes the lock file based purely on its modification time being >30s old:

Process A holds flock on inode₁ (legitimate, slow operation).
Process B sees a stale modtime, calls root.Remove(lockFileName). On Unix this unlinks the dirent but keeps the inode alive for any open fd — A's flock is still kernel-valid.
Process B opens the path again (gets a brand-new inode₂), flocks it, and returns success.
A and B now both run inside what they think is an exclusive section.

The 30s threshold was always a kludge: flock is automatically released by the kernel on process death, so a held flock by definition belongs to a live holder. Removing the file does nothing to that holder; it only creates the inode-swap hazard.

A secondary bug lived in the recovery TOCTOU: two callers reaching recoverStaleLock concurrently both saw a stale modtime, both called Remove, the loser got ENOENT, and tryLock surfaced that as a hard lock-acquisition failure even though recovery had succeeded.

Fix part 1 — inode verification on acquire

acquireOnce(root, exclusive) runs the full open → flock → truncate → verify sequence atomically:

fl, _ := openFile(root)
if err := lockFile(fl.Fd(), exclusive); err != nil {
    fl.Close(); return nil, err
}
fl.Truncate(0)                              // refresh modtime
same, err := isCurrentLockFile(fl, root)    // os.SameFile(fl.Stat(), root.Stat(lockFileName))
if err != nil || !same {
    releaseLock(fl); fl.Close()
    return nil, errStaleInode               // or err
}
return fl, nil

isCurrentLockFile returns false when the path resolves to a different inode than the locked fd, or when the path no longer exists. Either condition means a concurrent recovery unlinked the file between our open and our flock — we drop the bad lock and let the retry loop start over against whatever is now at the path. retryLock now loops acquireOnce instead of retrying flock against a stale fd, so each retry gets a fresh descriptor.

recoverStaleLock lost its *os.File parameter (it now works purely by path) and treats ENOENT on both Stat and Remove as success — closing the TOCTOU window without a mutex.

To make the release-without-close step possible, releaseLock was split out from unlockFile on both Unix and Windows. unlockFile still releases + closes, so external callers' contracts are unchanged.

Fix part 2 — modtime heartbeat for holders

tryLock now starts a goroutine on successful acquisition that re-truncates the lock file every heartbeatInterval (10s) until unlock. Each tick also re-verifies the locked descriptor's inode against the path; a mismatch indicates the heartbeat itself was starved past staleThreshold and the lock has been hijacked, which is logged via log/slog.

The UnlockFunc returned to the caller cancels the goroutine's context and waits for it to exit before closing the fd, so the heartbeat can never operate on a closed descriptor.

heartbeatInterval (10s) and staleThreshold (30s) are package-private vars rather than consts so tests can shorten them. Production callers cannot tune them; the 3× safety ratio is fixed.

Hijack detection without in-band failure

When the heartbeat detects an inode mismatch, the holder's outstanding operation has no way to be cancelled in-band — UnlockFunc is the only callback we hold. The mismatch is therefore logged but the holder keeps running. This is a deliberate trade-off: the heartbeat fires every 10s vs. a 30s threshold, so on a non-pathological system the hijack window is effectively closed. The log line is the safety net for SIGSTOP / extreme GC pause / scheduler starvation.

Test plan

go test ./store/posixage/... -race -count=1 passes
go vet ./store/posixage/... clean
golangci-lint run ./store/posixage/... reports 0 issues
flock_race_test.go covers:
- isCurrentLockFile correctly identifies an unlinked path
- isCurrentLockFile correctly identifies inode replacement
- Two concurrent recoverStaleLock calls both return success (TOCTOU fix)
- Heartbeat keeps a long-running holder live (recovery refuses to fire after several stale-thresholds with the lock held)
Existing TestFlock and TestRecoverLock suites still pass

What was dropped during review

Two t.Skip() placeholder tests that documented the now-fixed holder-side limitations.
A "two waiters cannot both acquire after recovery" test that used a live raw-flock holder. With a live holder, flock on the unlinked orphan always returns EWOULDBLOCK, so the test never exercised the ghost-lock path it claimed to cover. It gave false confidence and is removed rather than left as theatre.

🤖 Generated with Claude Code

When a stale lock file is recovered, the unlinking process leaves any existing flock holder pointing at an orphaned inode while the next caller flocks a fresh inode created at the same path. Both callers then believe they hold exclusive access. recoverStaleLock also lost the race between its Stat and Remove calls, surfacing a spurious lock-acquisition failure when another caller had already unlinked the file. acquireOnce now opens, flocks, compares the locked fd's inode to the inode currently at the lock-file path (via os.SameFile), and only then truncates to refresh modtime. On mismatch it releases the lock and returns errStaleInode so the retry loop picks up a fresh fd against whatever file is now at the path. retryLock loops acquireOnce instead of retrying flock against a stale fd. recoverStaleLock is now path-based and treats ENOENT on Stat or Remove as success, closing the stat/remove TOCTOU window. This does not protect a long-running holder whose modtime ages past 30s without a heartbeat — that race is documented in skipped tests and would need a holder-side modtime refresh to fully close. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docker-agent

Assessment: 🔴 CRITICAL

Note: Verifier returned an empty response (anti-loop rule applied). Findings below are from the drafter, manually reviewed against source. High-confidence issues only are reported.

This PR correctly fixes the core ghost-lock race on the acquiring side using /. However, there is a residual TOCTOU window in itself and a silently swallowed truncate error that can re-expose a valid lock to stale-recovery removal.

Close the holder-side half of the ghost-lock race. The prior commit verified the locked inode at acquisition time, which protected new acquirers but left long-running holders vulnerable: any caller whose critical section exceeded the 30s stale threshold would be hijacked when another process unlinked the lock file and created a fresh inode at the same path. A goroutine started by tryLock now re-truncates the lock file every 10s for the duration of the hold, keeping its modtime young enough that concurrent recovery callers never misclassify it as abandoned. Each tick also re-verifies that the fd's inode still matches the path; a mismatch indicates the heartbeat itself was starved past the stale threshold and we have been hijacked, which is logged via log/slog. The UnlockFunc stops the goroutine and waits for it to exit before closing the fd, so the heartbeat can never operate on a closed descriptor. Also swap the order of truncate and inode verification in acquireOnce so the modtime refresh is committed before the inode check, narrowing the window between a passing check and a concurrent unlink. Tests: - Drop both t.Skip() placeholder tests; the limitations they documented are now fixed by the heartbeat. - Drop the "two waiters cannot both acquire after recovery" test; with a live raw-flock holder it never exercised the ghost-lock code path, so it gave false confidence. - Add a heartbeat smoke test that shortens both intervals via the package-private vars and verifies recoverStaleLock refuses to recover a heartbeating holder after several stale-thresholds of wall time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Document that the returned UnlockFunc owns both the file descriptor and the background heartbeat goroutine, and that callers must invoke it exactly once (typically via defer) to avoid leaking either. Add idiomatic examples on TryLock and TryRLock, and note the heartbeat-stop-before- close ordering on UnlockFunc itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace log/slog with the project's logging.Logger interface so the library no longer writes to a default sink. The heartbeat goroutine resolves its logger via logging.FromContext, falling back to a noop when the caller hasn't set one. posixage's fileStore plumbs its configured logger through with logging.WithLogger so the existing WithLogger option on store.New propagates transparently without any new parameters on TryLock/TryRLock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Alano Terblanche <18033717+Benehiko@users.noreply.github.com>

docker-agent Bot reviewed May 27, 2026

View reviewed changes

Comment thread store/posixage/internal/flock/flock.go Outdated

Comment thread store/posixage/internal/flock/flock.go Outdated

Benehiko changed the title ~~fix(posixage/flock): verify locked inode to prevent ghost-lock race~~ fix(posixage/flock): close ghost-lock race via inode verify + heartbeat May 27, 2026

Benehiko mentioned this pull request May 27, 2026

posixage/flock: ghost-lock race when stale-recovery unlinks a still-flocked file #530

Closed

Benehiko and others added 2 commits May 28, 2026 09:57

joe0BAB previously approved these changes May 28, 2026

View reviewed changes

style: format and lint

5048a3d

Signed-off-by: Alano Terblanche <18033717+Benehiko@users.noreply.github.com>

Benehiko dismissed joe0BAB’s stale review via 5048a3d May 28, 2026 08:29

joe0BAB approved these changes May 28, 2026

View reviewed changes

Benehiko merged commit 26d7096 into main May 28, 2026
18 checks passed

Benehiko deleted the fix/posixage-flock-stale-recovery-race branch May 28, 2026 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(posixage/flock): close ghost-lock race via inode verify + heartbeat#529

fix(posixage/flock): close ghost-lock race via inode verify + heartbeat#529
Benehiko merged 5 commits into
mainfrom
fix/posixage-flock-stale-recovery-race

Benehiko commented May 27, 2026 •

edited

Loading

Uh oh!

docker-agent Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Benehiko commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The bug

Fix part 1 — inode verification on acquire

Fix part 2 — modtime heartbeat for holders

Hijack detection without in-band failure

Test plan

What was dropped during review

Uh oh!

docker-agent Bot left a comment

Choose a reason for hiding this comment

Assessment: 🔴 CRITICAL

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Benehiko commented May 27, 2026 •

edited

Loading