Sunday hack day: TUI cockpit + MM integration + ACT v2 + docs#1
Open
ScavieFae wants to merge 24 commits into
Open
Sunday hack day: TUI cockpit + MM integration + ACT v2 + docs#1ScavieFae wants to merge 24 commits into
ScavieFae wants to merge 24 commits into
Conversation
…hestrator Outside-wrap of orchestrator.py: subprocess + stdout-tail + parser + four typed panes (Embodiment, Signals, Skills, Dance Floor). Layout mirrors Hive's 2x2 50/50 grid; role colors match (planner=indigo, verifier=orange, watcher=teal, vla=pop-green). Stub orchestrator (tui/stub_orchestrator.py) emits canned print lines matching Ryan's exact format so the TUI can be demoed without an arm. Smoke-tested end-to-end: stub subprocess + TUI in run_test mode populates all four panes; planner/vla/verifier/watcher all update; abort command flows through the input prompt; watcher_state.json polling works. Files: tui/__main__.py python -m tui entry tui/app.py + app.tcss Textual App + 2x2 grid styling tui/events.py typed event vocabulary + role colors tui/parser.py orchestrator stdout -> events tui/panes/embodiment.py top-left: connection state tui/panes/signals.py top-right: per-role status + vla bar tui/panes/skills.py bottom-left: active/history/available tui/panes/dance_floor.py bottom-right: log + input prompt tui/stub_orchestrator.py canned events for offline dev pyproject.toml + textual, rich, pyyaml .gitignore ignore runtime/ Brief: brief-001-orchestrator-tui Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
defer auto-launch, fix orchestrator cwd, mode state machine, drop fake stdin commands. brief-002 phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mm_client polling, embodiment surfaces MM/Spark/cal/ports/cams, run preflight refuses on MM down or missing cal. brief-002 phase 2.
record/teleop/cal-status commands; new TELEOP and RECORDING modes with strict mutex. brief-002 phase 3.
stderr/stdout pattern match → recovering mode + retry/abort, watcher.py spawned alongside orchestrator when WATCHER_CAMERA_INDEX set, subprocess death watchdog with last-5-line dump. brief-002 phase 4.
per-process MM WS subscribe routes [mm] lines to dance floor; hang banner if no log in 10s while RECORDING/TELEOP active. brief-002 phase 5 (final).
inference-focused. mode mutex, command grammar, dance floor tags, common failures. brief-002 follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
idle-only, mm /api/setup/cameras/preview (or whatever mm.py uses), saves to ~/.cache/xle-hack/cam-previews/cam<idx>_<role>_<ts>.jpg, auto-opens via macOS `open`. brief-002 follow-up.
cameras() periodic poll was opening cv2 devices each tick (LED cycling) and blocking the Textual main thread. Devices poll now runs in a worker thread and only fires on mount + after orchestrator exits / teleop stops / recording stops. Fast poll keeps health+spark fresh at 5s without touching cameras. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MM preview reads one frame; UVC cams sometimes return ret=False on first read, dropping cam 0. Direct cv2 + 5-frame discard + 2 retries captures all canonical cams reliably. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static was sized to 4 lines on first render (before MM polled), and set_mm_state called plain refresh() — which repaints but does not recompute layout. Subsequent renders produced 11+ lines, all clipped to the original 4-row box. Pass layout=True on the first MM-state arrival so the pane grows to fit. Also wrapped the entire _poll_mm_fast body in try/except (a raise inside set_interval cancels the timer for the rest of the session) and added a one-shot dance-floor log on first state population so future failures surface immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
new INFERRING mode + MM /api/inference/start. skill alias resolves "act" to local_models/.../act_<skill>/checkpoints/last. mode mutex same as recording/teleop. ws log subscribe on start.
Bare 'skill <name>' now resolves to local act_<skill>/checkpoints/last, then smolvla_<skill>/checkpoints/last, then skills.yaml HF URI. Avoids the SmolVLAConfig compile_model field-mismatch when loading the team's HF SmolVLAs against an older lerobot venv. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-skill policy paths/URIs + default + fallback live in skills_overlay.yaml at xle-hack root. _cmd_skill resolves through overlay first, falls back to arbitrary path/URI, then to the team skills.yaml vla.uri. Adds get_policy and get_fallback to tui.skills. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawn MM uvicorn backend from the cockpit when it's down. Tracks the subprocess in self._mm_proc; mm restart kills + relaunches. Detached session so MM survives if the TUI exits — operator must kill it explicitly via mm down (or another terminal). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local act_pick_*/checkpoints/last were trained with upstream lerobot 0.5.1; MM's fork is on 0.3.4 and can't parse them. The Mattie-NT HF repos are verified working in MM GUI. Now `skill pick_cup` resolves to Mattie-NT/act_pick_cup; `act_local` alias still available for the local copies if/when MM ever runs newer lerobot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lerobot fork's sanity_check_dataset_name (control_utils.py:186) rejects repo_ids that don't start with eval_ when a policy is provided. Match MM GUI's convention: Mattie-NT/eval_<skill>_smoke. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Active section now reflects TUI commands (skill/teleop/record), not just the orchestrator's invocations. History section removed; doesn't add information once each finished op is in the dance floor anyway. Commands section lists the input grammar for at-a-glance reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v3 trained on newer data; v2 and v1 retained as alias variants for quick A/B. act_local points at the on-disk source we just pushed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v3 verified working at the rig. Drop v1/v2 aliases — clean overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
new mm cleanup command (and auto-fires once on 409 from skill / record / teleop start), iterates /api/system/port-locks and hits the matching owner-typed stop endpoint per held lock.
Risks accepted going into the auto-stop watcher work (silent failure surfacing, race idempotency, polling scope). Plus the unresolved mm cleanup "didn't work" report left open for repro.
skill subprocess exit auto-stops INFERRING mode after surfacing exit info to dance floor. tasks/<name>.md is a numbered markdown list of TUI commands; runner dispatches each, awaits mode → IDLE, advances. abort cancels the task in flight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- modal/train_act.py: ACT-from-scratch training app on H100 (parallels train_smolvla.py) - scripts/relax_arm.py: connect+disconnect to release torque after a crashed inference run - wiki/operating-docs/running-models.md: per-model copy-paste cheat sheet + crash-mode runbook - wiki/operating-docs/makermods-setup/: working bindings snapshot (ports, cal files, camera roles) - HANDOFF_HOME.md: inference-affecting patches (fps=15 override, async_read timeout 1000ms) + additional Mattie-NT checkpoints - tui/app.tcss: grid-columns 1fr 2fr (Warp live-test fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
24 commits of Sunday hack-day work toward the 4pm demo:
mm up|down|restartcommands, port-lock auto-retry on stale 409s, polling split into 5s health + on-event device updates, MM cleanup hooks, auto-stop watcher.skills_overlay.yamlresolves skill names to local checkpoints (HF as fallback), withskill <name> [<policy_uri>]for ad-hoc inference. Cup currently routed toact_pick_cup_v3.cams snapcaptures + labels canonical previews; split client retries with frame-discard.modal/train_act.py(ACT-from-scratch on H100, mirrorstrain_smolvla.py) +scripts/relax_arm.pyfor crashed-run torque release.wiki/operating-docs/running-models.md(per-model copy-paste cheat sheet + crash-mode runbook),wiki/operating-docs/makermods-setup/(working bindings snapshot with screenshots),wiki/operating-docs/tui-cockpit.md(TUI operator reference), HANDOFF_HOME camera-patch notes.Test plan
.venv/bin/python -m tuiand renders all four panesmm up|down|restartcommands toggle the makermods backend cleanlyskill cupruns inference via the overlay (currentlyact_pick_cup_v3)cams snapcaptures three labeled previewslocalhost:8001/operating-docs/running-models/afteruvx zensical buildmodal deploy modal/train_act.pysucceeds;modal run --detach modal/train_act.py::mainagainst a tagged dataset trains and pushes to HF🤖 Generated with Claude Code