Skip to content

Sunday hack day: TUI cockpit + MM integration + ACT v2 + docs#1

Open
ScavieFae wants to merge 24 commits into
mainfrom
sunday-hack-day
Open

Sunday hack day: TUI cockpit + MM integration + ACT v2 + docs#1
ScavieFae wants to merge 24 commits into
mainfrom
sunday-hack-day

Conversation

@ScavieFae
Copy link
Copy Markdown
Owner

Summary

24 commits of Sunday hack-day work toward the 4pm demo:

  • TUI cockpit (phase 1-5) — four-pane orchestrator wrapper (embodiment / signals / skills / dance floor) with MM read+write side, mode mutex, WS log surface, camera-recovery rails, watcher companion, and hang detection.
  • MakerMods integrationmm up|down|restart commands, port-lock auto-retry on stale 409s, polling split into 5s health + on-event device updates, MM cleanup hooks, auto-stop watcher.
  • Skill overlay systemskills_overlay.yaml resolves skill names to local checkpoints (HF as fallback), with skill <name> [<policy_uri>] for ad-hoc inference. Cup currently routed to act_pick_cup_v3.
  • Camera toolingcams snap captures + labels canonical previews; split client retries with frame-discard.
  • ACT v2 training inframodal/train_act.py (ACT-from-scratch on H100, mirrors train_smolvla.py) + scripts/relax_arm.py for crashed-run torque release.
  • Operator docswiki/operating-docs/running-models.md (per-model copy-paste cheat sheet + crash-mode runbook), wiki/operating-docs/makermods-setup/ (working bindings snapshot with screenshots), wiki/operating-docs/tui-cockpit.md (TUI operator reference), HANDOFF_HOME camera-patch notes.

Test plan

  • TUI launches via .venv/bin/python -m tui and renders all four panes
  • mm up|down|restart commands toggle the makermods backend cleanly
  • skill cup runs inference via the overlay (currently act_pick_cup_v3)
  • cams snap captures three labeled previews
  • Operator docs render at localhost:8001/operating-docs/running-models/ after uvx zensical build
  • modal deploy modal/train_act.py succeeds; modal run --detach modal/train_act.py::main against a tagged dataset trains and pushes to HF

🤖 Generated with Claude Code

ScavieFae and others added 24 commits May 10, 2026 10:37
…hestrator

Outside-wrap of orchestrator.py: subprocess + stdout-tail + parser + four
typed panes (Embodiment, Signals, Skills, Dance Floor). Layout mirrors
Hive's 2x2 50/50 grid; role colors match (planner=indigo, verifier=orange,
watcher=teal, vla=pop-green). Stub orchestrator (tui/stub_orchestrator.py)
emits canned print lines matching Ryan's exact format so the TUI can be
demoed without an arm.

Smoke-tested end-to-end: stub subprocess + TUI in run_test mode populates
all four panes; planner/vla/verifier/watcher all update; abort command
flows through the input prompt; watcher_state.json polling works.

Files:
  tui/__main__.py            python -m tui entry
  tui/app.py + app.tcss      Textual App + 2x2 grid styling
  tui/events.py              typed event vocabulary + role colors
  tui/parser.py              orchestrator stdout -> events
  tui/panes/embodiment.py    top-left: connection state
  tui/panes/signals.py       top-right: per-role status + vla bar
  tui/panes/skills.py        bottom-left: active/history/available
  tui/panes/dance_floor.py   bottom-right: log + input prompt
  tui/stub_orchestrator.py   canned events for offline dev
  pyproject.toml             + textual, rich, pyyaml
  .gitignore                 ignore runtime/

Brief: brief-001-orchestrator-tui

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
defer auto-launch, fix orchestrator cwd, mode state machine,
drop fake stdin commands. brief-002 phase 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mm_client polling, embodiment surfaces MM/Spark/cal/ports/cams,
run preflight refuses on MM down or missing cal. brief-002 phase 2.
record/teleop/cal-status commands; new TELEOP and RECORDING modes
with strict mutex. brief-002 phase 3.
stderr/stdout pattern match → recovering mode + retry/abort,
watcher.py spawned alongside orchestrator when WATCHER_CAMERA_INDEX
set, subprocess death watchdog with last-5-line dump. brief-002 phase 4.
per-process MM WS subscribe routes [mm] lines to dance floor;
hang banner if no log in 10s while RECORDING/TELEOP active.
brief-002 phase 5 (final).
inference-focused. mode mutex, command grammar, dance floor tags,
common failures. brief-002 follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
idle-only, mm /api/setup/cameras/preview (or whatever mm.py uses),
saves to ~/.cache/xle-hack/cam-previews/cam<idx>_<role>_<ts>.jpg,
auto-opens via macOS `open`. brief-002 follow-up.
cameras() periodic poll was opening cv2 devices each tick (LED
cycling) and blocking the Textual main thread. Devices poll now
runs in a worker thread and only fires on mount + after orchestrator
exits / teleop stops / recording stops. Fast poll keeps health+spark
fresh at 5s without touching cameras.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MM preview reads one frame; UVC cams sometimes return ret=False on
first read, dropping cam 0. Direct cv2 + 5-frame discard + 2 retries
captures all canonical cams reliably.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static was sized to 4 lines on first render (before MM polled), and
set_mm_state called plain refresh() — which repaints but does not
recompute layout. Subsequent renders produced 11+ lines, all clipped
to the original 4-row box. Pass layout=True on the first MM-state
arrival so the pane grows to fit.

Also wrapped the entire _poll_mm_fast body in try/except (a raise
inside set_interval cancels the timer for the rest of the session)
and added a one-shot dance-floor log on first state population so
future failures surface immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
new INFERRING mode + MM /api/inference/start. skill alias resolves
"act" to local_models/.../act_<skill>/checkpoints/last.
mode mutex same as recording/teleop. ws log subscribe on start.
Bare 'skill <name>' now resolves to local act_<skill>/checkpoints/last,
then smolvla_<skill>/checkpoints/last, then skills.yaml HF URI.
Avoids the SmolVLAConfig compile_model field-mismatch when loading
the team's HF SmolVLAs against an older lerobot venv.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-skill policy paths/URIs + default + fallback live in
skills_overlay.yaml at xle-hack root. _cmd_skill resolves through
overlay first, falls back to arbitrary path/URI, then to the team
skills.yaml vla.uri. Adds get_policy and get_fallback to tui.skills.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawn MM uvicorn backend from the cockpit when it's down. Tracks the
subprocess in self._mm_proc; mm restart kills + relaunches. Detached
session so MM survives if the TUI exits — operator must kill it
explicitly via mm down (or another terminal).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local act_pick_*/checkpoints/last were trained with upstream lerobot
0.5.1; MM's fork is on 0.3.4 and can't parse them. The Mattie-NT HF
repos are verified working in MM GUI. Now `skill pick_cup` resolves
to Mattie-NT/act_pick_cup; `act_local` alias still available for the
local copies if/when MM ever runs newer lerobot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lerobot fork's sanity_check_dataset_name (control_utils.py:186)
rejects repo_ids that don't start with eval_ when a policy is
provided. Match MM GUI's convention: Mattie-NT/eval_<skill>_smoke.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Active section now reflects TUI commands (skill/teleop/record), not
just the orchestrator's invocations. History section removed; doesn't
add information once each finished op is in the dance floor anyway.
Commands section lists the input grammar for at-a-glance reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v3 trained on newer data; v2 and v1 retained as alias variants for
quick A/B. act_local points at the on-disk source we just pushed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v3 verified working at the rig. Drop v1/v2 aliases — clean overlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
new mm cleanup command (and auto-fires once on 409 from skill /
record / teleop start), iterates /api/system/port-locks and hits
the matching owner-typed stop endpoint per held lock.
Risks accepted going into the auto-stop watcher work (silent failure
surfacing, race idempotency, polling scope). Plus the unresolved
mm cleanup "didn't work" report left open for repro.
skill subprocess exit auto-stops INFERRING mode after surfacing exit
info to dance floor. tasks/<name>.md is a numbered markdown list of
TUI commands; runner dispatches each, awaits mode → IDLE, advances.
abort cancels the task in flight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- modal/train_act.py: ACT-from-scratch training app on H100 (parallels train_smolvla.py)
- scripts/relax_arm.py: connect+disconnect to release torque after a crashed inference run
- wiki/operating-docs/running-models.md: per-model copy-paste cheat sheet + crash-mode runbook
- wiki/operating-docs/makermods-setup/: working bindings snapshot (ports, cal files, camera roles)
- HANDOFF_HOME.md: inference-affecting patches (fps=15 override, async_read timeout 1000ms) + additional Mattie-NT checkpoints
- tui/app.tcss: grid-columns 1fr 2fr (Warp live-test fix)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant