Sunday hack day: TUI cockpit + MM integration + ACT v2 + docs by ScavieFae · Pull Request #1 · ScavieFae/xle-hack

ScavieFae · 2026-05-10T22:06:59Z

Summary

24 commits of Sunday hack-day work toward the 4pm demo:

TUI cockpit (phase 1-5) — four-pane orchestrator wrapper (embodiment / signals / skills / dance floor) with MM read+write side, mode mutex, WS log surface, camera-recovery rails, watcher companion, and hang detection.
MakerMods integration — mm up|down|restart commands, port-lock auto-retry on stale 409s, polling split into 5s health + on-event device updates, MM cleanup hooks, auto-stop watcher.
Skill overlay system — skills_overlay.yaml resolves skill names to local checkpoints (HF as fallback), with skill <name> [<policy_uri>] for ad-hoc inference. Cup currently routed to act_pick_cup_v3.
Camera tooling — cams snap captures + labels canonical previews; split client retries with frame-discard.
ACT v2 training infra — modal/train_act.py (ACT-from-scratch on H100, mirrors train_smolvla.py) + scripts/relax_arm.py for crashed-run torque release.
Operator docs — wiki/operating-docs/running-models.md (per-model copy-paste cheat sheet + crash-mode runbook), wiki/operating-docs/makermods-setup/ (working bindings snapshot with screenshots), wiki/operating-docs/tui-cockpit.md (TUI operator reference), HANDOFF_HOME camera-patch notes.

Test plan

TUI launches via .venv/bin/python -m tui and renders all four panes
mm up|down|restart commands toggle the makermods backend cleanly
skill cup runs inference via the overlay (currently act_pick_cup_v3)
cams snap captures three labeled previews
Operator docs render at localhost:8001/operating-docs/running-models/ after uvx zensical build
modal deploy modal/train_act.py succeeds; modal run --detach modal/train_act.py::main against a tagged dataset trains and pushes to HF

🤖 Generated with Claude Code

…hestrator Outside-wrap of orchestrator.py: subprocess + stdout-tail + parser + four typed panes (Embodiment, Signals, Skills, Dance Floor). Layout mirrors Hive's 2x2 50/50 grid; role colors match (planner=indigo, verifier=orange, watcher=teal, vla=pop-green). Stub orchestrator (tui/stub_orchestrator.py) emits canned print lines matching Ryan's exact format so the TUI can be demoed without an arm. Smoke-tested end-to-end: stub subprocess + TUI in run_test mode populates all four panes; planner/vla/verifier/watcher all update; abort command flows through the input prompt; watcher_state.json polling works. Files: tui/__main__.py python -m tui entry tui/app.py + app.tcss Textual App + 2x2 grid styling tui/events.py typed event vocabulary + role colors tui/parser.py orchestrator stdout -> events tui/panes/embodiment.py top-left: connection state tui/panes/signals.py top-right: per-role status + vla bar tui/panes/skills.py bottom-left: active/history/available tui/panes/dance_floor.py bottom-right: log + input prompt tui/stub_orchestrator.py canned events for offline dev pyproject.toml + textual, rich, pyyaml .gitignore ignore runtime/ Brief: brief-001-orchestrator-tui Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

defer auto-launch, fix orchestrator cwd, mode state machine, drop fake stdin commands. brief-002 phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mm_client polling, embodiment surfaces MM/Spark/cal/ports/cams, run preflight refuses on MM down or missing cal. brief-002 phase 2.

record/teleop/cal-status commands; new TELEOP and RECORDING modes with strict mutex. brief-002 phase 3.

stderr/stdout pattern match → recovering mode + retry/abort, watcher.py spawned alongside orchestrator when WATCHER_CAMERA_INDEX set, subprocess death watchdog with last-5-line dump. brief-002 phase 4.

per-process MM WS subscribe routes [mm] lines to dance floor; hang banner if no log in 10s while RECORDING/TELEOP active. brief-002 phase 5 (final).

inference-focused. mode mutex, command grammar, dance floor tags, common failures. brief-002 follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

idle-only, mm /api/setup/cameras/preview (or whatever mm.py uses), saves to ~/.cache/xle-hack/cam-previews/cam<idx>_<role>_<ts>.jpg, auto-opens via macOS `open`. brief-002 follow-up.

cameras() periodic poll was opening cv2 devices each tick (LED cycling) and blocking the Textual main thread. Devices poll now runs in a worker thread and only fires on mount + after orchestrator exits / teleop stops / recording stops. Fast poll keeps health+spark fresh at 5s without touching cameras. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MM preview reads one frame; UVC cams sometimes return ret=False on first read, dropping cam 0. Direct cv2 + 5-frame discard + 2 retries captures all canonical cams reliably. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Static was sized to 4 lines on first render (before MM polled), and set_mm_state called plain refresh() — which repaints but does not recompute layout. Subsequent renders produced 11+ lines, all clipped to the original 4-row box. Pass layout=True on the first MM-state arrival so the pane grows to fit. Also wrapped the entire _poll_mm_fast body in try/except (a raise inside set_interval cancels the timer for the rest of the session) and added a one-shot dance-floor log on first state population so future failures surface immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

new INFERRING mode + MM /api/inference/start. skill alias resolves "act" to local_models/.../act_<skill>/checkpoints/last. mode mutex same as recording/teleop. ws log subscribe on start.

Bare 'skill <name>' now resolves to local act_<skill>/checkpoints/last, then smolvla_<skill>/checkpoints/last, then skills.yaml HF URI. Avoids the SmolVLAConfig compile_model field-mismatch when loading the team's HF SmolVLAs against an older lerobot venv. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-skill policy paths/URIs + default + fallback live in skills_overlay.yaml at xle-hack root. _cmd_skill resolves through overlay first, falls back to arbitrary path/URI, then to the team skills.yaml vla.uri. Adds get_policy and get_fallback to tui.skills. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Spawn MM uvicorn backend from the cockpit when it's down. Tracks the subprocess in self._mm_proc; mm restart kills + relaunches. Detached session so MM survives if the TUI exits — operator must kill it explicitly via mm down (or another terminal). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Local act_pick_*/checkpoints/last were trained with upstream lerobot 0.5.1; MM's fork is on 0.3.4 and can't parse them. The Mattie-NT HF repos are verified working in MM GUI. Now `skill pick_cup` resolves to Mattie-NT/act_pick_cup; `act_local` alias still available for the local copies if/when MM ever runs newer lerobot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lerobot fork's sanity_check_dataset_name (control_utils.py:186) rejects repo_ids that don't start with eval_ when a policy is provided. Match MM GUI's convention: Mattie-NT/eval_<skill>_smoke. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Active section now reflects TUI commands (skill/teleop/record), not just the orchestrator's invocations. History section removed; doesn't add information once each finished op is in the dance floor anyway. Commands section lists the input grammar for at-a-glance reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v3 trained on newer data; v2 and v1 retained as alias variants for quick A/B. act_local points at the on-disk source we just pushed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v3 verified working at the rig. Drop v1/v2 aliases — clean overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

new mm cleanup command (and auto-fires once on 409 from skill / record / teleop start), iterates /api/system/port-locks and hits the matching owner-typed stop endpoint per held lock.

Risks accepted going into the auto-stop watcher work (silent failure surfacing, race idempotency, polling scope). Plus the unresolved mm cleanup "didn't work" report left open for repro.

skill subprocess exit auto-stops INFERRING mode after surfacing exit info to dance floor. tasks/<name>.md is a numbered markdown list of TUI commands; runner dispatches each, awaits mode → IDLE, advances. abort cancels the task in flight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- modal/train_act.py: ACT-from-scratch training app on H100 (parallels train_smolvla.py) - scripts/relax_arm.py: connect+disconnect to release torque after a crashed inference run - wiki/operating-docs/running-models.md: per-model copy-paste cheat sheet + crash-mode runbook - wiki/operating-docs/makermods-setup/: working bindings snapshot (ports, cal files, camera roles) - HANDOFF_HOME.md: inference-affecting patches (fps=15 override, async_read timeout 1000ms) + additional Mattie-NT checkpoints - tui/app.tcss: grid-columns 1fr 2fr (Warp live-test fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ScavieFae and others added 24 commits May 10, 2026 10:37

[tui] phase 1 — cockpit foundation

1a44ce8

defer auto-launch, fix orchestrator cwd, mode state machine, drop fake stdin commands. brief-002 phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[tui] phase 2 — MM read-side

ad972ea

mm_client polling, embodiment surfaces MM/Spark/cal/ports/cams, run preflight refuses on MM down or missing cal. brief-002 phase 2.

[tui] phase 3 — MM write-side + mode mutex

eb6c6dd

record/teleop/cal-status commands; new TELEOP and RECORDING modes with strict mutex. brief-002 phase 3.

[tui] phase 4 — camera-recovery rails + watcher companion

30acf90

stderr/stdout pattern match → recovering mode + retry/abort, watcher.py spawned alongside orchestrator when WATCHER_CAMERA_INDEX set, subprocess death watchdog with last-5-line dump. brief-002 phase 4.

[tui] phase 5 — WS log surface + MM hang detection

26a9414

per-process MM WS subscribe routes [mm] lines to dance floor; hang banner if no log in 10s while RECORDING/TELEOP active. brief-002 phase 5 (final).

[docs] tui-cockpit operator reference page

7d49a1e

inference-focused. mode mutex, command grammar, dance floor tags, common failures. brief-002 follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[tui] add cams snap — capture + label canonical camera previews

d9ddce9

idle-only, mm /api/setup/cameras/preview (or whatever mm.py uses), saves to ~/.cache/xle-hack/cam-previews/cam<idx>_<role>_<ts>.jpg, auto-opens via macOS `open`. brief-002 follow-up.

[tui] add skill <name> [<policy_uri>] for direct inference

5c26f42

new INFERRING mode + MM /api/inference/start. skill alias resolves "act" to local_models/.../act_<skill>/checkpoints/last. mode mutex same as recording/teleop. ws log subscribe on start.

[overlay] pick_cup → act_pick_cup_v3 (just uploaded)

11bafe3

v3 trained on newer data; v2 and v1 retained as alias variants for quick A/B. act_local points at the on-disk source we just pushed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[overlay] pick_cup: v3 only

a584ddc

v3 verified working at the rig. Drop v1/v2 aliases — clean overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[tui] mm cleanup + auto-retry stale-lock 409s

f5337c8

new mm cleanup command (and auto-fires once on 409 from skill / record / teleop start), iterates /api/system/port-locks and hits the matching owner-typed stop endpoint per held lock.

[docs] log auto-stop risks + mm cleanup open item

3d952d6

Risks accepted going into the auto-stop watcher work (silent failure surfacing, race idempotency, polling scope). Plus the unresolved mm cleanup "didn't work" report left open for repro.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sunday hack day: TUI cockpit + MM integration + ACT v2 + docs#1

Sunday hack day: TUI cockpit + MM integration + ACT v2 + docs#1
ScavieFae wants to merge 24 commits into
mainfrom
sunday-hack-day

ScavieFae commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ScavieFae commented May 10, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant