feat(smoke): VHS tape harness for interactive CLI smoke tests#933
Draft
Aaronontheweb wants to merge 3 commits intonetclaw-dev:devfrom
Draft
feat(smoke): VHS tape harness for interactive CLI smoke tests#933Aaronontheweb wants to merge 3 commits intonetclaw-dev:devfrom
Aaronontheweb wants to merge 3 commits intonetclaw-dev:devfrom
Conversation
The existing smoke sandbox (scripts/smoke/check.sh) covers only the
non-interactive surface — daemon REST API, JSON output shapes, flag-driven
provider/model setup. Spectre-prompt flows like `netclaw init` were entirely
untested in CI, which is how the recent interactive-wizard regression slipped
through.
This change adds a tape-driven harness that runs alongside check.sh:
* scripts/smoke/install-vhs.sh — pinned VHS install (Linux/x86_64) with
optional SHA256 verification and apt-installed ttyd/ffmpeg deps.
* scripts/smoke/run-tape.sh — per-tape wrapper that prepends preamble.tape
(docker exec session into netclaw-sandbox with isolated NETCLAW_HOME),
runs vhs with a hard timeout, dispatches to the tape's assertion script,
and collects PNG/log/config artifacts on failure.
* scripts/smoke/run-tapes.sh — light/full/single-tape driver shared by
CI and local devs. Supports --keep-stack and --no-up for inner-loop
iteration.
* tests/smoke-interactive/tapes/init-wizard.tape — full Personal-posture
wizard walkthrough using only Wait+Screen synchronization (no Sleep).
Anchors are harvested from src/Netclaw.Cli/Tui/Wizard/Steps/*StepView.cs.
* tests/smoke-interactive/assertions/init-wizard.sh — schema validates the
produced config.json by running `netclaw doctor` and asserting expected
provider type / identity fields.
* tests/smoke-interactive/tapes/help.tape — minimal harness self-test.
* tests/smoke-interactive/tapes/README.md — authoring conventions
(no Sleep, no screenshots, anchor every step, pair with assertion).
* .github/workflows/smoke_sandbox.yml — three new steps after check.sh:
install vhs, run light tape suite (--no-up --keep-stack so the existing
teardown still runs), and existing collect-logs/upload/teardown handle
the new tapes.log + per-tape artifact directories.
check.sh is unchanged. The tape harness augments it; it does not replace it.
Notes for the first CI run:
* The init-wizard Wait+Screen anchors and provider list ordering are
derived from source-grepping; expect to iterate on 1-2 patterns once
a real run produces a partial PNG.
* The smoke compose mounts netclaw-home:/root/.netclaw but the image
USER netclaw has home /home/netclaw — the volume is effectively dead.
Tapes sidestep this via per-tape NETCLAW_HOME=/tmp/tape-<name>. The
underlying mismatch is worth fixing in a separate PR.
The smoke compose previously mounted a named volume `netclaw-home` at
`/root/.netclaw`, but the smoke image's USER directive is `netclaw`
(uid 1654) with home at `/home/netclaw`. The daemon writes its state
to `~/.netclaw` which resolves to `/home/netclaw/.netclaw` — meaning
the named volume was effectively dead and `collect-logs.sh` was
fishing for daemon logs at a path that never existed.
Switch to a tmpfs mount at `/home/netclaw/.netclaw` so:
* the path matches where the daemon actually writes
* every `up` starts with guaranteed-empty netclaw state, removing
any chance of stale config bleeding between smoke runs
* the mount can never collide with a host machine's `~/.netclaw`,
even hypothetically — tmpfs is a kernel-backed in-RAM filesystem
Update `collect-logs.sh` to read daemon log / pid / home directory
from the corrected path.
Iterating against a live smoke stack (qwen2:0.5b on Ollama) surfaced a
handful of incorrect assumptions in the first cut. With these fixes,
`scripts/smoke/run-tapes.sh light` runs help.tape and init-wizard.tape
green against the smoke compose stack, including a full Personal-posture
wizard walkthrough with all assertions on the produced state passing.
Changes:
* install-vhs.sh: bump pinned VHS version 0.8.0 → 0.11.0. Earlier
versions don't recognise `Wait+Screen /pattern/` and parse the entire
directive as garbage. Force apt to non-interactive mode so kernel-
upgrade whiptail prompts can't hang the install.
* preamble.tape: VHS interprets `Set Width` / `Set Height` as pixel
dimensions (minimum 120×120), not character columns. Set 1400×800 at
FontSize 14 (~95 cols × 50 rows). Drop the `bash --noprofile --norc`
flags — they leave PS1 empty so there's no prompt to wait on; instead
Sleep briefly after the docker-exec attaches and set `PS1=TAPE$ `
ourselves.
* init-wizard.tape:
- Fix provider list ordering: alphabetical by TypeKey, so Ollama is
one Down from Anthropic, not two.
- Skip the External Skills sub-step entirely. ExternalSkillsStep.
IsApplicable returns false when no external skill sources are
detected (smoke container has no Claude Code, etc.).
- Skill Feeds prompt defaults to "Yes, connect"; Down + Enter to pick
the "No — skip" option.
- Replace Ctrl+U for clearing the endpoint default with `Right N` +
`Backspace N` since the cursor lands at column 0 on field focus.
- After the Health Check step the wizard auto-launches into the chat
TUI rather than returning to the shell — wait on the chat status
bar (qwen2:0.5b ready), then Ctrl+Q to drop back to the prompt.
- Tighten per-step timeouts: 10s for prompt transitions, 45s for the
Ollama probe, 120s for the wizard health check (cold model load).
* assertions/init-wizard.sh:
- Config lives at `${NETCLAW_HOME}/config/netclaw.json`, not the
`config.json` I originally guessed.
- Schema uses PascalCase keys (`Providers`, `Models`, `Security`).
- Identity user name is written to `identity/SOUL.md`, not into
netclaw.json.
- Doctor exit codes per DoctorRunner: 0 = all PASS, 1 = at least one
FAIL (assertion failure), 2 = WARNs only (acceptable for the
Personal-posture flow which intentionally prints a "HostAllowed
shell" warning).
* tapes/README.md: correct the dimensions guidance to reflect that VHS
uses pixels, and document the default 1400×800 surface.
* .gitignore: ignore `smoke-logs/` so failure artefacts (PNGs, container
logs, NETCLAW_HOME tarballs) from local tape runs aren't committed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scripts/smoke/check.sh), covering the interactive Termina/TUI surface thatcheck.shcannot reach — the layer where the recent init-wizard regression lived.init-wizardPersonal-posture walkthrough with post-tape assertions on the produced state (netclaw doctorexit code, schema-relevant fields innetclaw.json, identity inSOUL.md)./root/.netclawto a tmpfs at/home/netclaw/.netclaw(the image's actualUSER netclawhome), so smoke runs always start with empty netclaw state and can never collide with a host's~/.netclaw.This is a foundation PR — only
help.tapeandinit-wizard.tapeland here. Follow-ups will addmodel-add,provider-add,webhooks-add,reminder-create, andmcp-permissionstapes plus a nightly workflow.What's new
scripts/smoke/install-vhs.shscripts/smoke/run-tape.shscripts/smoke/run-tapes.shlight/full/ single-tape driver shared by CI and local dev (--keep-stack/--no-up)tests/smoke-interactive/tapes/preamble.tapedocker execsession with per-tapeNETCLAW_HOME=/tmp/tape-<name>tests/smoke-interactive/tapes/help.tapetests/smoke-interactive/tapes/init-wizard.tapeWait+Screenonlytests/smoke-interactive/assertions/init-wizard.shnetclaw doctor+ jq field checks +identity/SOUL.mduser checktests/smoke-interactive/tapes/README.mdSleep, no screenshots, anchor every step).github/workflows/smoke_sandbox.ymlcheck.sh: install vhs → run light tapes → existing collect/upload still capture artifactscheck.shis unchanged. The tape harness augments it; it does not replace it. The two cover different slices of the CLI surface (interactive Termina TUI vs HTTP/REST/JSON).Verified locally and in CI
CI confirmed green on this branch — the new "Install vhs" + "Run interactive tapes (light)" steps inside the Smoke Sandbox job complete in ~80 seconds end-to-end.
Notes for reviewers
Wait+Screen /pattern/).netclaw doctorexits 2 in this configuration because Personal posture + HostAllowed shell intentionally trips a WARN. The assertion treats 0 and 2 as success and only fails on 1.IsApplicablereturns false when no external skill sources are detected); if the smoke image starts seeding Claude Code or similar, init-wizard.tape will need an additional branch.Ctrl+Qto drop back to the shell before assertions run.Test plan
init-wizard.tapefails red with aWait+Screentimeout and an attached PNG of the hung screen. (Not done in this PR.)