Commit 6fc7b31
committed
fix(fs_lock): always-join heartbeat thread in Drop to prevent post-cleanup writes
`LockGuard::drop` previously used `heartbeat_done.recv_timeout(100ms)`
to wait for the heartbeat thread's acknowledgement, then fell through
to `remove_lock_if_owned` regardless of whether the ack arrived. Under
CI load this created a race:
1. Drop signals shutdown; recv_timeout times out before heartbeat
acknowledges (heartbeat mid-`atomic_write_lock_metadata` IO).
2. Drop logs warning, calls `remove_lock_if_owned` → file removed.
3. A different caller acquires the lock, writes its own metadata.
4. Our still-alive heartbeat finishes its in-flight write — its
`heartbeat_once` validated our ownership BEFORE the on-disk swap,
so its rename overwrites the new owner's lockfile with our
stale metadata.
5. The new owner's heartbeat sees foreign metadata, exits NotOwner.
The new owner's Drop sees foreign metadata, `remove_lock_if_owned`
returns Ok(false), the lockfile persists.
The Linux unit test `acquire_serializes_concurrent_callers` then
panics at `assert!(!path.exists())` — and any production lock with
this shape would leak a stale lockfile that the next acquire can't
match to remove.
Fix: Drop now unconditionally `unpark`s and `join()`s the heartbeat
handle. This bounds drop latency to one `park_timeout` iteration
(~25ms) plus the current `heartbeat_once` IO — typically <500ms under
CI load — but guarantees the heartbeat is dead before `remove_lock_if_owned`
runs. The `heartbeat_done` channel field is kept (drained defensively)
for backward compatibility but is no longer used for synchronization.
Verified locally with 5 consecutive `cargo test -p agent-file-tools
--release --lib fs_lock::` runs (10/10 pass each), plus full 850-test
lib suite still green.1 parent 554cc66 commit 6fc7b31
1 file changed
Lines changed: 29 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
71 | 94 | | |
72 | | - | |
| 95 | + | |
73 | 96 | | |
| 97 | + | |
74 | 98 | | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
90 | 103 | | |
91 | 104 | | |
92 | 105 | | |
| |||
0 commit comments