Skip to content

[playbook-phase-10] Async, concurrency, and shared state discipline #302

@githubrobbi

Description

@githubrobbi

Goal

Execute Phase 10 of world_class_rust_workspace_refactor_playbook.md (§1082-1146, local-only): prevent subtle deadlocks, cancellation bugs, and runtime confusion; keep concurrency architecture explainable.

Recon (at SHA ff8b897dd, post-Phase-9 + #301 gap-closure)

  • #[tokio::main] binaries: 4 (uffs-daemon, uffs-mft, uffs-mcp, uffs-mcp http_gateway).
  • async fn + async blocks: 392 across 5 crates (daemon=225, mft=74, mcp=61, client=31, core=1).
  • tokio::spawn( sites: 27 (daemon=23, mft=3, client=1, mcp=0).
  • spawn_blocking sites: 62 (daemon=45, mft=15, client=1, core=1).
  • tokio::sync::* async locks: 6 (daemon=5, mcp=1) — central audit targets.
  • std::sync::Mutex/RwLock: 24 across the workspace.
  • Arc<Mutex<>> patterns: 7 (client=4, daemon=2, mcp=1).
  • Lock-across-await candidates (heuristic grep): 7 sites in crates/uffs-daemon/src/index/{dispatch,drives,forget_drive}.rs — all on ShardRegistry's RwLock<DriveIndex>.
  • Channels: 5 bounded + 4 unbounded (daemon=3, client=1) + 0 oneshot.
  • Timeouts: 24 sites (tokio::time::timeout in daemon=23, client=1).
  • Cancellation/shutdown infra: 0 CancellationToken + 3 ctrl_c + 11 .abort(); shutdown coordinated by await_shutdown_then_force_exit (uffs-daemon/src/lib.rs:368) with 3s graceful + 5s watchdog.
  • #[tokio::test]: 127 sites.
  • parking_lot: 0 — workspace stays on std::sync + tokio::sync only.

Phase 10 is

A predominantly audit + documentation + targeted-fix phase. The existing posture has been hardened iteratively across Phases 2b-7 (memory tiering, shard registry, journal loops, parity proofs). Phase 10's value is the per-spawn-site justification record, the lock-across-await audit verdict, the backpressure + timeout-coverage map, and the single-page concurrency model doc (playbook §1146 pass criterion).

Sub-phase plan (~10 h total)

# Sub-phase Est. Deliverable
10a Plan doc + baseline tool (scripts/dev/concurrency_audit.sh) 1.5 h PR
10b Lock-across-await audit (7 candidate sites) 1 h Findings-only or drift PR
10c Task ownership inventory (27 spawn sites) 2 h Folded into 10g
10d Backpressure audit (4 unbounded channels) 1 h Findings-only or drift PR
10e Timeout coverage audit 1 h Folded into 10g
10f Blocking-IO-in-async audit 1.5 h Findings-only or drift PR
10g concurrency_policy.md + per-crate # Concurrency rustdoc 2.5 h PR
10h CONTRIBUTING cross-link + final report 30 min Folded into 10g

Acceptance (8 criteria — full table in local plan §2)

Key ones from playbook §1142-1146:

  • No lock-across-await hazards remain. All 7 candidate sites either have a verdict comment + rustdoc justification or are refactored to extract-then-await.
  • Task lifecycle is explicit. Every tokio::spawn( has, at call-site or wrapping spawner-function rustdoc: owner / shutdown mechanism / error observation / cancellation behavior.
  • Concurrency model can be explained on one page. concurrency_policy.md §0 opens with ≤ 600 words + ≤ 1 Mermaid diagram covering task graph + shard lifecycle + IPC lifecycle + shutdown sequence.
  • Backpressure + timeout posture is intentional. All 4 unbounded channels + all IPC/IO/network boundaries either have a documented policy or are converted to bounded/timeout-wrapped.

Plus cross-cutting:

  • scripts/dev/concurrency_audit.sh exists + reruns in < 5 s + emits a Markdown report (7 dimensions).
  • docs/architecture/code-quality/concurrency_policy.md exists with all 7 required sections.
  • CONTRIBUTING.md has a 10-15-LOC "Async / concurrency / shared state policy" section.
  • Per-crate # Concurrency rustdoc sections at the roots of uffs-daemon, uffs-mft, uffs-mcp, uffs-client.

Companion docs

Rule-1 adherence

Zero #[allow] / blanket suppressions planned. Targeted clippy expects only where justified inline (e.g. a documented lock-across-await that's safe because the inner await is a tokio sleep with no shared-state interaction would get a focused #[expect(clippy::await_holding_lock, reason = "…")] with the safety contract in rustdoc).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions