Skip to content

Improve sync durability: health status API, degraded callbacks, flush guarantee #119

@khaliqgant

Description

@khaliqgant

Problem

Relayfile's WebSocket sync layer has several durability gaps that matter for production agent workloads:

  1. No health visibility — callers have no way to know if the sync connection is degraded (polling fallback, stale, reconnecting) versus healthy. Agents can silently read stale data.

  2. Ping interval too long — default 30s ping interval means a broken connection takes up to 30s to detect. For a real-time coordination layer this is too long.

  3. No degraded/recovered callbacks — no way for the caller to react when the connection enters or exits a degraded state (e.g., to surface a warning in UI or pause agent writes).

  4. No flush guarantee on local-mountAutoSyncHandle has no way to drain pending debounced writes before shutdown or before a critical operation. Files can be in the 50ms debounce window and dropped if the process exits.

  5. No watcher health signal — if the @parcel/watcher subscription fails silently, the mount appears healthy but local changes stop being detected.

Context

These gaps were surfaced during a comparison with mirage's architecture. Mirage is a pull-on-demand VFS library (no daemon, no WebSocket) — it trades real-time push for simplicity. Relayfile's push-first model is a genuine advantage for multi-agent coordination, but only if the push channel is reliable and observable.

Changes (in PR)

  • packages/sdk/typescript/src/sync.ts

    • Halve DEFAULT_PING_INTERVAL_MS from 30000 → 15000
    • Add RelayFileSyncHealthStatus interface with degraded, degradedReason, stateEnteredAt, lastFrameAt, reconnectAttempts
    • Add onDegraded and onRecovered callbacks to RelayFileSyncOptions
    • Add getHealthStatus() public method
    • Fire onDegraded when polling fallback activates; fire onRecovered on successful WebSocket reconnect
  • packages/local-mount/src/auto-sync.ts

    • Add flushPending(opts?) — drains all debounced writes and runs a full reconcile, returns count of files flushed
    • Add watchersHealthy() — returns true only when both mount and project watchers are successfully subscribed
  • packages/sdk/typescript/src/index.ts

    • Export RelayFileSyncHealthStatus type

Impact

Agents and orchestrators can now:

  • Query sync health before reading critical files
  • React to degraded state (pause writes, show warning, switch to safe mode)
  • Flush pending writes before shutdown or handoff
  • Verify watcher subscriptions are alive before trusting local state

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions