Skip to content

feat(smoke): VHS tape harness for interactive CLI smoke tests#933

Draft
Aaronontheweb wants to merge 3 commits intonetclaw-dev:devfrom
Aaronontheweb:claude-wt-cli-test-tape
Draft

feat(smoke): VHS tape harness for interactive CLI smoke tests#933
Aaronontheweb wants to merge 3 commits intonetclaw-dev:devfrom
Aaronontheweb:claude-wt-cli-test-tape

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Collaborator

@Aaronontheweb Aaronontheweb commented May 8, 2026

Summary

  • Adds a VHS tape-driven smoke test harness that runs alongside the existing non-interactive smoke (scripts/smoke/check.sh), covering the interactive Termina/TUI surface that check.sh cannot reach — the layer where the recent init-wizard regression lived.
  • Includes a full init-wizard Personal-posture walkthrough with post-tape assertions on the produced state (netclaw doctor exit code, schema-relevant fields in netclaw.json, identity in SOUL.md).
  • Switches the smoke compose from a misplaced named-volume mount at /root/.netclaw to a tmpfs at /home/netclaw/.netclaw (the image's actual USER netclaw home), so smoke runs always start with empty netclaw state and can never collide with a host's ~/.netclaw.

This is a foundation PR — only help.tape and init-wizard.tape land here. Follow-ups will add model-add, provider-add, webhooks-add, reminder-create, and mcp-permissions tapes plus a nightly workflow.

What's new

File Purpose
scripts/smoke/install-vhs.sh Pinned VHS install (Linux/x86_64 + apt deps)
scripts/smoke/run-tape.sh Per-tape wrapper: preamble substitution → vhs run → assertion → artefact collection
scripts/smoke/run-tapes.sh light / full / single-tape driver shared by CI and local dev (--keep-stack / --no-up)
tests/smoke-interactive/tapes/preamble.tape Establishes a docker exec session with per-tape NETCLAW_HOME=/tmp/tape-<name>
tests/smoke-interactive/tapes/help.tape Harness self-test
tests/smoke-interactive/tapes/init-wizard.tape Full Personal-posture wizard, Wait+Screen only
tests/smoke-interactive/assertions/init-wizard.sh netclaw doctor + jq field checks + identity/SOUL.md user check
tests/smoke-interactive/tapes/README.md Authoring conventions (no Sleep, no screenshots, anchor every step)
.github/workflows/smoke_sandbox.yml New steps after check.sh: install vhs → run light tapes → existing collect/upload still capture artifacts

check.sh is unchanged. The tape harness augments it; it does not replace it. The two cover different slices of the CLI surface (interactive Termina TUI vs HTTP/REST/JSON).

Verified locally and in CI

./scripts/smoke/run-tapes.sh light
# → help: OK
# → init-wizard: OK
#       .Providers.ollama.Type == 'ollama'
#       .Providers.ollama.Endpoint == 'http://ollama:11434'
#       .Models.Main.Provider == 'ollama'
#       .Models.Main.ModelId == 'qwen2:0.5b'
#       .Security.DeploymentPosture == 'Personal'
#       identity/SOUL.md contains 'Name: SmokeTester'

CI confirmed green on this branch — the new "Install vhs" + "Run interactive tapes (light)" steps inside the Smoke Sandbox job complete in ~80 seconds end-to-end.

Notes for reviewers

  • VHS pinned to v0.11.0 (minimum that supports Wait+Screen /pattern/).
  • netclaw doctor exits 2 in this configuration because Personal posture + HostAllowed shell intentionally trips a WARN. The assertion treats 0 and 2 as success and only fails on 1.
  • ExternalSkillsStep is skipped (IsApplicable returns false when no external skill sources are detected); if the smoke image starts seeding Claude Code or similar, init-wizard.tape will need an additional branch.
  • The init wizard auto-launches the chat TUI on success; the tape exits chat with Ctrl+Q to drop back to the shell before assertions run.

Test plan

  • CI smoke job is green on this branch (the existing smoke checks plus the new interactive tape job).
  • Failure-mode reproduction: introduce an intentional break in a wizard prompt on a throwaway branch and confirm init-wizard.tape fails red with a Wait+Screen timeout and an attached PNG of the hung screen. (Not done in this PR.)

The existing smoke sandbox (scripts/smoke/check.sh) covers only the
non-interactive surface — daemon REST API, JSON output shapes, flag-driven
provider/model setup. Spectre-prompt flows like `netclaw init` were entirely
untested in CI, which is how the recent interactive-wizard regression slipped
through.

This change adds a tape-driven harness that runs alongside check.sh:

  * scripts/smoke/install-vhs.sh — pinned VHS install (Linux/x86_64) with
    optional SHA256 verification and apt-installed ttyd/ffmpeg deps.
  * scripts/smoke/run-tape.sh — per-tape wrapper that prepends preamble.tape
    (docker exec session into netclaw-sandbox with isolated NETCLAW_HOME),
    runs vhs with a hard timeout, dispatches to the tape's assertion script,
    and collects PNG/log/config artifacts on failure.
  * scripts/smoke/run-tapes.sh — light/full/single-tape driver shared by
    CI and local devs. Supports --keep-stack and --no-up for inner-loop
    iteration.
  * tests/smoke-interactive/tapes/init-wizard.tape — full Personal-posture
    wizard walkthrough using only Wait+Screen synchronization (no Sleep).
    Anchors are harvested from src/Netclaw.Cli/Tui/Wizard/Steps/*StepView.cs.
  * tests/smoke-interactive/assertions/init-wizard.sh — schema validates the
    produced config.json by running `netclaw doctor` and asserting expected
    provider type / identity fields.
  * tests/smoke-interactive/tapes/help.tape — minimal harness self-test.
  * tests/smoke-interactive/tapes/README.md — authoring conventions
    (no Sleep, no screenshots, anchor every step, pair with assertion).
  * .github/workflows/smoke_sandbox.yml — three new steps after check.sh:
    install vhs, run light tape suite (--no-up --keep-stack so the existing
    teardown still runs), and existing collect-logs/upload/teardown handle
    the new tapes.log + per-tape artifact directories.

check.sh is unchanged. The tape harness augments it; it does not replace it.

Notes for the first CI run:
  * The init-wizard Wait+Screen anchors and provider list ordering are
    derived from source-grepping; expect to iterate on 1-2 patterns once
    a real run produces a partial PNG.
  * The smoke compose mounts netclaw-home:/root/.netclaw but the image
    USER netclaw has home /home/netclaw — the volume is effectively dead.
    Tapes sidestep this via per-tape NETCLAW_HOME=/tmp/tape-<name>. The
    underlying mismatch is worth fixing in a separate PR.
The smoke compose previously mounted a named volume `netclaw-home` at
`/root/.netclaw`, but the smoke image's USER directive is `netclaw`
(uid 1654) with home at `/home/netclaw`. The daemon writes its state
to `~/.netclaw` which resolves to `/home/netclaw/.netclaw` — meaning
the named volume was effectively dead and `collect-logs.sh` was
fishing for daemon logs at a path that never existed.

Switch to a tmpfs mount at `/home/netclaw/.netclaw` so:

  * the path matches where the daemon actually writes
  * every `up` starts with guaranteed-empty netclaw state, removing
    any chance of stale config bleeding between smoke runs
  * the mount can never collide with a host machine's `~/.netclaw`,
    even hypothetically — tmpfs is a kernel-backed in-RAM filesystem

Update `collect-logs.sh` to read daemon log / pid / home directory
from the corrected path.
Iterating against a live smoke stack (qwen2:0.5b on Ollama) surfaced a
handful of incorrect assumptions in the first cut. With these fixes,
`scripts/smoke/run-tapes.sh light` runs help.tape and init-wizard.tape
green against the smoke compose stack, including a full Personal-posture
wizard walkthrough with all assertions on the produced state passing.

Changes:

* install-vhs.sh: bump pinned VHS version 0.8.0 → 0.11.0. Earlier
  versions don't recognise `Wait+Screen /pattern/` and parse the entire
  directive as garbage. Force apt to non-interactive mode so kernel-
  upgrade whiptail prompts can't hang the install.

* preamble.tape: VHS interprets `Set Width` / `Set Height` as pixel
  dimensions (minimum 120×120), not character columns. Set 1400×800 at
  FontSize 14 (~95 cols × 50 rows). Drop the `bash --noprofile --norc`
  flags — they leave PS1 empty so there's no prompt to wait on; instead
  Sleep briefly after the docker-exec attaches and set `PS1=TAPE$ `
  ourselves.

* init-wizard.tape:
  - Fix provider list ordering: alphabetical by TypeKey, so Ollama is
    one Down from Anthropic, not two.
  - Skip the External Skills sub-step entirely. ExternalSkillsStep.
    IsApplicable returns false when no external skill sources are
    detected (smoke container has no Claude Code, etc.).
  - Skill Feeds prompt defaults to "Yes, connect"; Down + Enter to pick
    the "No — skip" option.
  - Replace Ctrl+U for clearing the endpoint default with `Right N` +
    `Backspace N` since the cursor lands at column 0 on field focus.
  - After the Health Check step the wizard auto-launches into the chat
    TUI rather than returning to the shell — wait on the chat status
    bar (qwen2:0.5b ready), then Ctrl+Q to drop back to the prompt.
  - Tighten per-step timeouts: 10s for prompt transitions, 45s for the
    Ollama probe, 120s for the wizard health check (cold model load).

* assertions/init-wizard.sh:
  - Config lives at `${NETCLAW_HOME}/config/netclaw.json`, not the
    `config.json` I originally guessed.
  - Schema uses PascalCase keys (`Providers`, `Models`, `Security`).
  - Identity user name is written to `identity/SOUL.md`, not into
    netclaw.json.
  - Doctor exit codes per DoctorRunner: 0 = all PASS, 1 = at least one
    FAIL (assertion failure), 2 = WARNs only (acceptable for the
    Personal-posture flow which intentionally prints a "HostAllowed
    shell" warning).

* tapes/README.md: correct the dimensions guidance to reflect that VHS
  uses pixels, and document the default 1400×800 surface.

* .gitignore: ignore `smoke-logs/` so failure artefacts (PNGs, container
  logs, NETCLAW_HOME tarballs) from local tape runs aren't committed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant