Symptom
relayfile-mount cold mounts slow past ~600s and hit the client poll cap after ~7+ stop/start remounts on the same box, while non-relayfile workspace modes (git clone-into-/workspace) stay fast and 100% successful on that same box.
Evidence — AgentWorkforce/cloud #1384 post-flip 50-run on reused box b89cef1f-... (cloud-agent box warm, relayfile mode = relayfile-mount invoked per warm):
- relayfile: 3/6 — remounts 1/4/16 fast (212s/218s/208s), remounts 7/10/13 slow → ~600s (p50 601s, p95 604s), still progressing (no hard failure).
- git / git-overlay: 100% ready (~210–255s) on the same box (they don't use the relayfile mount path).
- Intermittent, not strictly monotonic — remount 16 RECOVERED to 208s (fastest), so it's a transient/clustered slowdown in the 7–13 window, not pure linear growth. (A persistent state-accumulation component is still possible and worth checking.)
Hypothesis
The relayfile-mount binary keeps state in /home/daytona/.relayfile-mount-state (--state-dir) that persists across remounts on the same box. Each successive remount may work against accumulating/un-reset mount state (initial-sync / FUSE setup), making the mount's operations progressively slower / more prone to the 120s upstream-proxy read timeout, especially under transient backend slowness. git carries no equivalent persistent mount state, so it's unaffected.
Ask
- Audit the
--state-dir lifecycle in the relayfile-mount binary: is state reset/cleaned (or safely reused) on remount of the same --state-dir, or can it accumulate/wedge across rapid stop/start remount cycles?
- Confirm whether a stale/large
--state-dir slows initial-sync/mount, and whether the binary should reset it (or detect+repair) on remount.
Severity
Lower — needs ~7+ rapid remounts on one box to manifest, and it's slow-but-progressing (no data loss). Real long-lived agents reattach occasionally, not 50× rapid-fire, but it IS a real reuse-resilience concern for long-lived relayfile agents.
Context / cross-repo
- Consumer: AgentWorkforce/cloud cloud-agent box warm (
packages/core/src/relayfile/mount-script.ts only invokes the binary; it doesn't own state-dir lifecycle).
- Primary owner = this repo (relayfile-mount binary).
- Cloud-side companion will be filed ONLY IF cloud's per-step timing (cloud #1489) shows the cloud invocation's flush/cleanup (
startBoxRelayfileMount/flushBoxRelayfileMount) also contributes.
- Follow-up to the now-resolved cloud #1384 (CF-queue warm cutover).
Symptom
relayfile-mountcold mounts slow past ~600s and hit the client poll cap after ~7+ stop/start remounts on the same box, while non-relayfile workspace modes (git clone-into-/workspace) stay fast and 100% successful on that same box.Evidence — AgentWorkforce/cloud #1384 post-flip 50-run on reused box
b89cef1f-...(cloud-agent box warm, relayfile mode =relayfile-mountinvoked per warm):Hypothesis
The relayfile-mount binary keeps state in
/home/daytona/.relayfile-mount-state(--state-dir) that persists across remounts on the same box. Each successive remount may work against accumulating/un-reset mount state (initial-sync / FUSE setup), making the mount's operations progressively slower / more prone to the 120s upstream-proxy read timeout, especially under transient backend slowness. git carries no equivalent persistent mount state, so it's unaffected.Ask
--state-dirlifecycle in the relayfile-mount binary: is state reset/cleaned (or safely reused) on remount of the same--state-dir, or can it accumulate/wedge across rapid stop/start remount cycles?--state-dirslows initial-sync/mount, and whether the binary should reset it (or detect+repair) on remount.Severity
Lower — needs ~7+ rapid remounts on one box to manifest, and it's slow-but-progressing (no data loss). Real long-lived agents reattach occasionally, not 50× rapid-fire, but it IS a real reuse-resilience concern for long-lived relayfile agents.
Context / cross-repo
packages/core/src/relayfile/mount-script.tsonly invokes the binary; it doesn't own state-dir lifecycle).startBoxRelayfileMount/flushBoxRelayfileMount) also contributes.