Skip to content

relayfile-mount --state-dir state not reset/cleaned across remounts → degrades on repeated mount on the same box #219

@khaliqgant

Description

@khaliqgant

Symptom

relayfile-mount cold mounts slow past ~600s and hit the client poll cap after ~7+ stop/start remounts on the same box, while non-relayfile workspace modes (git clone-into-/workspace) stay fast and 100% successful on that same box.

Evidence — AgentWorkforce/cloud #1384 post-flip 50-run on reused box b89cef1f-... (cloud-agent box warm, relayfile mode = relayfile-mount invoked per warm):

  • relayfile: 3/6 — remounts 1/4/16 fast (212s/218s/208s), remounts 7/10/13 slow → ~600s (p50 601s, p95 604s), still progressing (no hard failure).
  • git / git-overlay: 100% ready (~210–255s) on the same box (they don't use the relayfile mount path).
  • Intermittent, not strictly monotonic — remount 16 RECOVERED to 208s (fastest), so it's a transient/clustered slowdown in the 7–13 window, not pure linear growth. (A persistent state-accumulation component is still possible and worth checking.)

Hypothesis

The relayfile-mount binary keeps state in /home/daytona/.relayfile-mount-state (--state-dir) that persists across remounts on the same box. Each successive remount may work against accumulating/un-reset mount state (initial-sync / FUSE setup), making the mount's operations progressively slower / more prone to the 120s upstream-proxy read timeout, especially under transient backend slowness. git carries no equivalent persistent mount state, so it's unaffected.

Ask

  • Audit the --state-dir lifecycle in the relayfile-mount binary: is state reset/cleaned (or safely reused) on remount of the same --state-dir, or can it accumulate/wedge across rapid stop/start remount cycles?
  • Confirm whether a stale/large --state-dir slows initial-sync/mount, and whether the binary should reset it (or detect+repair) on remount.

Severity

Lower — needs ~7+ rapid remounts on one box to manifest, and it's slow-but-progressing (no data loss). Real long-lived agents reattach occasionally, not 50× rapid-fire, but it IS a real reuse-resilience concern for long-lived relayfile agents.

Context / cross-repo

  • Consumer: AgentWorkforce/cloud cloud-agent box warm (packages/core/src/relayfile/mount-script.ts only invokes the binary; it doesn't own state-dir lifecycle).
  • Primary owner = this repo (relayfile-mount binary).
  • Cloud-side companion will be filed ONLY IF cloud's per-step timing (cloud #1489) shows the cloud invocation's flush/cleanup (startBoxRelayfileMount/flushBoxRelayfileMount) also contributes.
  • Follow-up to the now-resolved cloud #1384 (CF-queue warm cutover).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions