feat(heal): Self-Healing toggles + vendored rfheal library#41
Open
raffelino wants to merge 40 commits into
Open
feat(heal): Self-Healing toggles + vendored rfheal library#41raffelino wants to merge 40 commits into
raffelino wants to merge 40 commits into
Conversation
…an open ones - debug-1-dap-driver-foundation: in_progress → done (merged in v0.9.0) - 1-1-database-migration-for-phase-4-models: in-progress → done (Phase 4 shipped) - EE-1-konami-code-robot-parade: draft → planned (ready-for-dev) - sprint-status.yaml: new "Post-0.9.0 backlog" section tracking the Interactive Debugger epic (DEBUG-2/3 ready-for-dev, DEBUG-1 done), Launch UX polish (LAUNCH-1 ready-for-dev), and Easter Eggs (EE-1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure-frontend EE-1: pressing ↑↑↓↓←→←→BA outside any text-entry element sends an inline-SVG robot marching from left to right along the bottom of the screen. Listener mounts once at the App shell, respects prefers-reduced-motion, and never captures clicks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LAUNCH-1: post-0.9.0 distribution UX polish addressing two pain points reported on Windows (and present on every standalone platform): JSON-formatted boot logs are hostile to humans, and the URL the user needs to open is buried in 30+ log lines. - LOG_FORMAT=text flips main.py from pythonjsonlogger to a readable LEVEL logger: message form. JSON stays default everywhere else (Docker, make dev, CI). Standalone start scripts now default-set LOG_FORMAT=text in env. - Loud banner printed via stdout (NOT logger) after lifespan startup completes, showing http://localhost:<PORT>. Suppressed under PYTEST_CURRENT_TEST so test output stays clean. - Optional OPEN_BROWSER=1 (.env knob, default OFF) calls webbrowser.open after the banner. Failures swallowed so a headless host doesn't crash startup. - Windows / non-utf8 PYTHONIOENCODING falls back to ASCII box drawing; Unicode `═` would mojibake on Windows cmd in legacy code-page modes. - 21 new tests in tests/test_main.py covering formatter selection (incl. the AC4 default-JSON regression assertion), banner suppression under pytest, ASCII fallback, and OPEN_BROWSER truthy/falsy aliases + crash-resistance. - scripts/dist-README.md documents the banner appearance, the LOG_FORMAT toggle, and the OPEN_BROWSER auto-open flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Story DEBUG-2 — RUNNER+ users now get a 🐞 Debug button next to
the existing Retry on a failed run. Click → backend spawns a
RobotCode debug-launch subprocess (DEBUG-1 foundation), parses
the run's output.xml for the deepest failing keyword, sets a
breakpoint there, and the user lands in DebugPanel.vue with live
call-stack + variable scopes pushed via /ws/notifications.
Backend
- src/debug/router.py — `/api/v1/debug/sessions` (start, control,
state, disconnect). RUNNER+ effective-role gate; API tokens
stay role-capped (no team/project elevation).
- src/debug/session_manager.py — in-process registry with 409
dedup on (user_id, run_id), per-session event forwarder, 5-min
idle timeout, 30s `terminated` grace before subprocess kill.
- src/debug/output_xml_walker.py — defusedxml walk for the first
failing keyword's source+line; recurses into child suites and
picks the deepest failure.
- src/debug/state_fetcher.py — pulls stackTrace → scopes →
variables (top frame only) for the post-`stopped` snapshot.
- src/debug/schemas.py — Pydantic models for the API surface.
- src/audit/event_types.py — DEBUG_SESSION_STARTED/_ENDED.
- src/main.py — wires forwarder + state-fetcher in lifespan;
best-effort stop_all() on shutdown.
Frontend
- src/stores/debug.store.ts — Pinia store, 8 tests covering
start/state-replace/output-cap/topic-routing/control/stop.
- src/components/debug/DebugPanel.vue — header + stack + scopes
+ output log; toolbar buttons disabled when not paused.
- src/components/execution/RunDetailPanel.vue — Debug button
gated on run.status === 'failed' && hasMinRole('runner');
inline DebugPanel renders below the heal report.
- src/api/debug.api.ts — REST client + sendBeacon disconnect.
- src/composables/useWebSocket.ts — dispatches `debug_event` to
the store; topic-routed inside the store so events for other
sessions are silently dropped.
- i18n EN/DE/FR/ES under `debug.*` (btn, panel, error groups).
Tests
- 46 backend tests: walker + router (RBAC, audit, dedup, control,
state, ownership). RobotDebugSession is mocked end-to-end via
the manager's injectable factory — no real `robotcode`.
- 8 frontend store tests + DebugPanel renders cleanly in tsc + prod build.
Out of scope per story: conditional breakpoints, watch
expressions, multi-test debug, optional E2E (RUNs Chromium).
5-min idle-timeout is heuristic — revisit after first user feedback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DEBUG-3 — extends the DEBUG-2 backend + UI so a user can click any
keyword/assignment/control/return node in the Flow Editor's step
detail panel and press "Run up to here" to launch the test under
robotcode debug-launch with a breakpoint on that line.
Backend:
- POST /api/v1/debug/sessions accepts a second body shape
{file, test_name, line, repo_id} via a single discriminator-free
Pydantic model (mutually exclusive with the existing {run_id}).
- _validate_step_invocation walks the .robot to confirm the line
is inside the named test (and NOT the test-case header — RF won't
break there). Path-traversal guarded against repo root.
- 409 dedup at file+line scope so a same-step click silently
resumes; a different line in the same file produces TWO sessions
(the frontend stops the first via a confirm-modal before issuing
the second POST — the backend treats them as independent).
- Audit DEBUG_SESSION_STARTED now carries `source: "run"|"flow_editor"`.
Frontend:
- RobotStep.\_lineNumber annotated by parseRobotToForm (1-based).
Added to both RobotStep interfaces (RobotEditor.vue + flowConverter.ts)
and propagated through cloneStep so the deep-clone path covered
by FlowEditorStepIsolation.spec.ts keeps the metadata across the
detail panel re-builds.
- New "Run up to here" button in FlowEditor's step-detail panel,
visible only for stepType in {keyword, assignment, return,
if/else_if/else, for/while, try/except/finally} on a Test Case
node when the user has RUNNER+ and a filePath/repoId pair are in
scope. Hidden in the Keywords section (RF debug needs --test).
- AC6 rapid-fire semantics in store.classifyStepClick:
- 'idle': no session → start
- 'same': resume silently (409 → adopt existing session)
- 'different': render confirm-modal "stop and restart"
- DebugPanel rendered as fixed-position overlay teleported to body
while a session is active; Stop returns to the editor with the
canvas state intact.
- Dirty-buffer surfaces a save-prompt modal instead of debugging an
out-of-sync file; user saves manually then re-clicks.
i18n:
- flowEditor.debug.* keys in EN/DE/FR/ES (label, title, error
copy, two modal copies). Tour rotation grows from 30 to 31 tips
(tip31 points at this affordance).
Tests:
- backend tests/debug/test_router.py — 11 new tests covering happy
path, test-header-line rejection, unknown test, line-outside-
test, missing file, path-traversal, mixed body shapes, empty
body, RBAC, same-line dedup, different-line non-dedup. 19/19
green; full debug+audit suite 100/100 green.
- frontend tests/stores/debug.store.spec.ts — 8 new tests covering
startFromStep happy path + 409 silent-resume + non-409 rethrow,
classifyStepClick four cases (idle, same, different, terminated),
and a regression pin on cloneStep \_lineNumber preservation.
16/16 store tests green; full vitest 514/514 green.
- vue-tsc clean; npm run build clean (4 locale chunks emit fine).
Story spec: \_bmad-output/implementation-artifacts/debug-3-run-up-to-selection-action.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-run failing test in interactive DAP session. Builds on DEBUG-1's DAP driver foundation: adds POST /api/v1/debug/sessions, WebSocket debug:session:<uuid> topic, control endpoints, DebugPanel.vue, and DEBUG_SESSION_STARTED/ENDED audit codes. RBAC gated to RUNNER+. 409 dedup per (user, run). 5-min idle timeout + 30s grace on terminated. i18n in EN/DE/FR/ES. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run up to selected step in Flow Editor with same DAP panel. Extends
DEBUG-2's POST /debug/sessions to accept {file, test_name, line,
repo_id} via discriminated Union; adds AC4 header-line guard and
path-traversal validation. FlowEditor step-detail panel gains a
'▶ Bis hier ausführen' button gated on saved-buffer + RUNNER+ +
breakable step type. Same DebugPanel.vue reused as teleported
overlay. AC6 multi-tab dedup: silent same-line resume via 409,
confirm-modal restart on different-line. RobotStep gains optional
_lineNumber via parser annotation (extends cloneStep, pinned by
FlowEditorStepIsolation). i18n EN/DE/FR/ES + tipOfTheDay tip31.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standalone-start UX polish for v0.9.1: LOG_FORMAT=text toggle flips main.py logging from JSON to readable for the bundled distribution start scripts (Docker / make dev / tests stay on JSON). Loud "open this URL" banner via print() (not logger) after FastAPI lifespan startup, with ASCII fallback on Windows or non-utf8 PYTHONIOENCODING. Optional OPEN_BROWSER=1 calls webbrowser.open. Build scripts default LOG_FORMAT=text in the distributed .env.example; dist-README.md documents the new toggles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Konami code easter egg. Pressing ↑↑↓↓←→←→BA outside any text input sends a 4s SVG robot marching across the bottom of the viewport. Layout-independent (event.code), pointer-events: none so it never captures clicks, aria-hidden, fully disabled under prefers-reduced-motion. Single global listener at App.vue. No i18n, no docs, no analytics — discovery is the point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before DEBUG-2/3 spawn, the router checks for the `robotcode` binary
in the project's venv. Missing → 424 Failed Dependency with detail
{code, repo_id, env_id, package, message}. New endpoint
POST /api/v1/debug/sessions/install-prerequisites runs `uv pip install
robotcode` into that venv via async subprocess (300s timeout, log tail
captured). Frontend dialog (DebugPrereqDialog) catches the 424 from
both entry points (RunDetailPanel 🐞 + FlowEditor ▶ Bis hier ausführen),
offers Install/Cancel; on Install the store retries the original start
automatically. Audit code DEBUG_ROBOTCODE_INSTALLED. i18n EN/DE/FR/ES.
Tests: 8 new prereq.py units + 7 router tests (424 paths + install
endpoint happy/already-installed/failure/RBAC) + 5 frontend store
tests (424 catch + retry-after-install + cancel). Pre-existing
event-loop-pollution bug in test_router.py's autouse fixture fixed
on the way (asyncio.run instead of get_event_loop on teardown).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In-dialog RobotCode install on missing prereq. Replaces the generic 502 on first-time debug clicks with a 424 Failed Dependency surfaced as a modal: 'RobotCode is not installed in this project's environment. Install now? [Install] [Cancel]'. On Install the backend runs uv pip install robotcode into the project's venv (audited via DEBUG_ROBOTCODE_INSTALLED) and the frontend retries the original debug start. Both DEBUG-2's 🐞 button and DEBUG-3's ▶ Bis hier ausführen trigger the same flow. i18n complete in EN/DE/FR/ES. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… install) Per UX guidelines: secondary/cancel buttons use BaseButton variant=ghost, primary actions use BaseButton variant=primary with the built-in :loading spinner instead of bare HTML buttons + ad-hoc classes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…imeout The bare 'robotcode did not announce a TCP port within 15.0 s' error gave users no clue what robotcode was actually doing during boot. Two complementary changes: 1. Capture stdout/stderr during the port-wait into a 200-line ring buffer; on timeout/exit include the last 20 lines in the DebugSessionStartFailed message so users (and future bug reports) can see the real failure mode. 2. Spawn the subprocess in a fresh session/process-group on POSIX (start_new_session=True). On cleanup, SIGKILL the entire pgid via os.killpg so any Robot Framework → Browser library → Playwright → Chromium grandchildren get reaped along with the parent. Without this, orphaned grandchildren routinely outlive a session and can block the next port-0 bind by holding shared state under ~/.cache/robotcode/. Closing VS Code 'fixed' the issue for users only because that incidentally tore down those zombies; this is the proper fix. Also: bump the default port_parse_timeout from 15s to 30s (cold robotcode boots can take 20s+ on slow venvs after a fresh install) and expose ROBOSCOPE_DEBUG_PORT_TIMEOUT for operator override. Two new tests pin the new error shape (boot-output included + returncode surfaced on early exit). 11/11 in tests/debug/test_robot_debug_session.py green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The umbrella robotcode package alone gives us the CLI shell but NOT the debug-launch subcommand — that's registered as a click plugin by the robotcode-debugger PyPI package. Without the [debugger] extra, spawn fails at runtime with "No such command 'debug-launch'" — exactly what the new diagnostic message just surfaced. Two changes: 1. Install package: robotcode → robotcode[debugger]. Pulls in the umbrella + the debugger plugin that registers the subcommand. 2. Prereq check: also verify <venv>/lib/python*/site-packages/ robotcode/debugger/ exists. Catches the partial-install state (umbrella alone) BEFORE the spawn explodes — user gets the install dialog instead of a runtime error toast. Tests updated accordingly: fixtures seed the debugger marker alongside the binary, plus a new unit test pinning the "binary-without-plugin → False" case (the actual user-visible failure mode that triggered this fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production failure: clicking Debug surfaced
"robotcode exited with code 2 during boot. Last output:
Error: No such option: -w."
followed by the test silently never stopping at the breakpoint. The
Story DEBUG-1 foundation was wired against an older robotcode CLI
that has since dropped flags and changed protocol order.
Three fixes:
1. **CLI argv:** modern `robotcode debug-launch` rejects `-w` and the
trailing `<robot_path>` positional. Spawn is now just
`robotcode debug-launch --tcp 127.0.0.1:<port>`. Port is
pre-allocated by us (the launcher rejects port 0 / "1<=x<=65535").
2. **Connect strategy:** robotcode prints nothing when it's listening,
so the regex-on-stdout port parser never fired. Replaced with a
poll-connect loop on the pre-allocated port. A background
stdout-pump task drains output continuously into a 200-line ring
buffer (a) so the pipe doesn't fill once Log keywords run, and
(b) so the diagnostic error includes the real robotcode output.
3. **Handshake order:** DAP spec requires the client wait for the
`initialized` event before sending `setBreakpoints`. We were
firing `setBreakpoints` immediately after the initialize response,
which modern RobotCode rejects with `Unknown Command
'setBreakpoints'`. New sequence:
- `initialize` (await response)
- fire-and-don't-await `launch` (servers commonly defer the
launch response until after configurationDone — awaiting
deadlocks)
- wait for `initialized` event
- `setBreakpoints` per file
- `configurationDone`
The launch payload also now includes `python`, `cwd`, `target`,
and `console: "internalConsole"` — without those the launcher
either dispatches `runInTerminal` to us (we don't handle it) or
fails to spawn the child runtime.
**Integration test:** new `@pytest.mark.integration` class
`TestRealRobotCodeSpawn` exercises the entire pipeline against the
user's installed robotcode in ~/.roboscope/venvs/roboscope-default.
Catches breaking changes in the robotcode CLI surface BEFORE they
hit users.
- `test_real_spawn_handshake_and_test_runs` — passes today, asserts
the RF banner appears on stdout (proves spawn + handshake worked).
- `test_real_breakpoint_pauses_execution` — `xfail(strict=True)`. The
remaining bug: even though the test runs end-to-end, the `stopped`
event never arrives. Tracked as a separate launcher → child proxy
layer issue; the strict-xfail will alert when fixed.
**Process-tree cleanup** retained from earlier: cleanup cancels both
the new stdout-pump task and the launch future before SIGKILL'ing
the process group, so no orphans survive between sessions.
Plus drive-by ruff cleanups in DEBUG-1's dap_client.py
(typing.Callable → collections.abc.Callable, asyncio.TimeoutError
→ TimeoutError, contextlib.suppress in stop()).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
remaining stop-event proxy bug Layer-isolation diagnostic via a new `pause`-based integration test: both breakpoint AND pause flows xfail with the same symptom (test runs end-to-end, no `stopped` event), proving the remaining bug is NOT in breakpoint path resolution but in the launcher → us event- forwarding chain for the `StoppedEvent` family. Other events (`output`, `initialized`) traverse the same proxy and arrive fine. Changes: - `_launch_args` now includes `outputMessages: True` and `outputLog: True` so RF execution messages stream over the DAP `output` channel as the test runs (useful for the run-detail panel even before DEBUG-5 lands). - New `test_real_pause_request_pauses_execution` integration test marked `xfail(strict=True)` — pinned alongside the breakpoint test. When BOTH turn green at once, the fix is the same root cause; if only one does, the spec's hypothesis was wrong. - Story doc `debug-5-breakpoint-resolution.md` captures the full investigation: what's been verified working (spawn + handshake), what's been ruled out (path resolution — confirmed by the pause test), and concrete next-step instructions for whoever picks up the proxy debugging (patch venv's `debugger.py::send_event` with a print to see if `stopped` is being sent at all). Sprint status: DEBUG-5 → ready-for-dev (pinned by xfail tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…w (DEBUG-5)
Root cause: RobotCode's listener emits robot* events (robotStarted,
robotEnqueued, robotEnded, robotLog, etc.) whose bodies inherit
from SyncedEventBody. Each such event makes the in-process Debugger
synchronously block the listener thread for up to 15 s on
`self.sync_event.wait(15)`, waiting for a `robot/sync` RPC request
back from us. We never sent one. The very first synced event
(`robotEnqueued` from ListenerV3.start_suite) tied up the listener
so RF never reached start_test → start_keyword → process_start_state,
which is where breakpoint matching lives. So breakpoints AND pause
both silently failed — `output` and `initialized` got through
because their bodies don't mix in SyncedEventBody.
Fix: register a handler for every robot* event family that fires a
fire-and-forget `robot/sync` request on the DAP client. That sets
the gating Event in the child, the listener thread returns, RF
proceeds normally, breakpoints fire as expected.
Diagnosis: instrumented the user's venv with print()-to-file
tracing (server.py, debugger.py, listeners.py, launcher/client.py),
ran the pause integration test, watched the trace stop dead
between "V2.start_suite ENTER" and the next listener line. Read
on_debugger_send_event source → spotted the synced wait. Reverted
all venv instrumentation; this commit is clean.
Tests:
- 3/3 integration tests now pass against the user's installed
robotcode in ~/.roboscope/venvs/roboscope-default:
- test_real_spawn_handshake_and_test_runs (was passing)
- test_real_breakpoint_pauses_execution (was strict-xfail)
- test_real_pause_request_pauses_execution (was strict-xfail;
uses Sleep 3s so pause races RF mid-execution before
termination)
- 71/71 unit tests in tests/debug/ green; no regressions.
Story spec at _bmad-output/.../debug-5-breakpoint-resolution.md
records the full diagnosis path + the actual fix shape (NOT path
resolution as initially hypothesised — the launch payload, handshake,
and breakpoint paths were all correct; the missing piece was honoring
RobotCode's robot/sync ack contract).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layers of tests, each catching a different failure mode:
1. **Real-DAP control tests** (`test_robot_debug_session.py
::TestRealControlButtons`) — five `@pytest.mark.integration`
tests that drive `RobotDebugSession.continue_/next_/step_in/
step_out/disconnect` against the user's installed robotcode.
Asserts the matching `stopped`/`terminated` event arrives with
the correct `reason` (breakpoint/step/etc.).
2. **Real-router HTTP tests** (`test_router_integration.py`
::TestRealRouterControls`) — five `@pytest.mark.integration`
tests that drive `POST /api/v1/debug/sessions/{id}/{cmd}`
against a real RobotDebugSession (factory swap reverts the
unit-test fake-session injection). Polls
`GET /sessions/{id}/state` after each control to confirm the
cache advances. Mirrors what the frontend buttons trigger end-
to-end.
3. **Frontend component tests** (`DebugPanel.spec.ts`) — 15 Vitest
tests that mount the actual panel against a fake API and verify:
- Continue / Step Over / Step In / Step Out gate on `paused &
!terminated` (Stop is always enabled — the user must always be
able to abort).
- Each button click → correct `debug.store` action → correct
`postControl` arg.
- Stop emits `closed` even when disconnect throws (the panel
must close even if backend is unreachable).
- State events update `paused_at` line in the header.
- Terminated events surface the badge.
- Output events append to the live log.
- Cross-session events (different `topic`) are ignored.
The user-reported "I can't cleanly step / abort / continue via the
UI" was rooted in DEBUG-5 (missing robot/sync ack at the DAP
layer), not in the buttons themselves. With that fix, the buttons
NOW work; these tests are the regression watchdogs that catch any
future drift in the chain.
Test totals (before this commit):
- backend tests/debug/ unit: 71 ✓
- backend integration (DAP-direct controls): 5 ✓ (covered by 8 in
TestRealRobotCodeSpawn + TestRealControlButtons)
- backend integration (HTTP router): 5 ✓ new
- frontend full vitest: 543 ✓ (was 528, +15 from DebugPanel.spec)
Drive-by lint cleanups (E501 wrap on two helper signatures).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug: ``step._lineNumber`` is stamped once when ``parseRobotToForm`` reads the file. The moment the user inserts a new keyword above the selected step, every step below shifts in the source file but its ``_lineNumber`` doesn't. Clicking the "Bis hier ausführen" button then sent the backend a stale line — the test ran past the keyword the user pointed at, or stopped at the wrong line entirely. Fix: new ``computeStepLine(form, isResource, tcIdx, stepIdx)`` in ``flowConverter.ts`` mirrors ``serializeFormToRobot`` line-for-line and returns the LIVE source line. ``FlowEditor.vue::_stepDebugPayload`` now calls it on every click instead of reading the stale field. The function tracks every emitter branch the serializer cares about: * preamble lines (leading comments) * Settings section (incl. multi-line ``[Documentation]`` continuations via ``...`` continuation lines) * Variables section * Test Cases section header + per-testcase metadata (``[Documentation]`` multi-line, ``[Tags]``, ``[Setup]``, ``[Teardown]``, ``[Timeout]``, ``[Template]``) and the trailing blank between test cases 13 unit tests in ``FlowEditorComputeStepLine.spec.ts`` pin every emit branch, plus the primary regression: insert/remove a step and the surviving step's reported line shifts correctly. The legacy ``step._lineNumber`` field stays — still set at parse time for diagnostics, no longer load-bearing for the debug button. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 21 new tests covering corner cases the user asked about
("steppe einzeln durch und probiere jede Funktion auch innerhalb
von komplexeren Vorgängen"). All 21 pass against real robotcode +
real RobotDebugSession; no production fixes needed — the previous
DEBUG-5 robot/sync ack already addressed every blocker.
Backend real-DAP scenarios (TestComplexDebugScenarios, 8 tests):
- sequential step-through (pause → next × 2 → continue → terminated)
with reason=step asserted between each step. Pins the exact UI
flow the user reported as "doesn't feel clean".
- breakpoint inside a FOR loop fires per iteration (3 stops for 3).
- step into a user-defined keyword descends, step out returns to
the caller scope.
- pause during a long-running keyword (Sleep 5s) produces a
stopped with reason=pause; continue lets it finish.
- multiple breakpoints fire in source order (line 3 then line 5).
- disconnect while RF is mid-execution (no breakpoint, in a Sleep)
terminates cleanly with no orphaned subprocess.
- paused_at line advances after each next — this is what the
run-detail panel header renders.
- control calls after terminated are benign no-ops, not crashes.
Backend HTTP-router scenarios (TestComplexRouterScenarios, 3 tests):
- full walk-through via HTTP: start → pause → next × 2 → continue
→ terminated, with state-cache line assertions per step.
- 409-dedup response carries the existing session_id so the
frontend can silently re-attach.
- control hits AFTER disconnect return 404 (not crash) — guards
against stale browser tabs racing the reap.
Frontend store scenarios (10 tests):
- terminated event arriving DURING a pending control resolves
cleanly — the WS update wins, postControl doesn't reset state.
- multiple state events use last-write-wins.
- output buffer capped at 300 (verified by pumping 350 events).
- events for a different session_id are silently ignored.
- events arriving after reset() are ignored.
- rapid sequential controls all dispatch (no implicit dedup —
the disabled-button contract owns that).
- control without an active session is a silent no-op.
- stop after reset is a no-op (no double-disconnect).
- sessionId/isActive lifecycle: null → active → null after stop.
- terminated event flips isActive=false but keeps sessionId set
so the badge renders.
Test totals after this commit:
- backend tests/debug/ default: 71 passed
- backend tests/debug/ -m integration: 19 passed (11 + 8)
- frontend full vitest: 566 passed (+10 new store edge cases)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the longest-standing rough edge in the Self-Healing opt-in
contract: today the user has to leave the visual editor, open the
Code tab, manually rename `Click` → `Heal Click` for each step, and
also remember to add `Library RoboScopeHeal` to the Settings
section. Two new toggles let the same opt-in happen in-place,
preserving every existing safety invariant.
HEAL-1 — Flow Editor detail panel checkbox:
- Visible only on `keyword` / `assignment` steps whose keyword is
one of the 13 supported names (bare or already `Heal *`).
- Toggling rewrites THAT single step's `keyword` field via the
same `onStepFieldChange` → `updateStepFromNode` →
`rebuildAndReselect` path the rest of the panel uses. The
unsaved-changes badge fires; no runtime mutation.
- Library row is added/removed via the existing settings array
(idempotent helpers — duplicate adds and accidental removals
of user-configured rows are impossible).
HEAL-2 — RobotEditor toolbar toggle:
- Single button next to the `.robot` badge, label reflects state
(`Self-Healing: On / Off`). Hidden when zero heal-able keywords
exist (a Log-only file).
- One click rewrites every heal-able step across all test cases
AND user keywords, plus adds/removes the bare
`Library RoboScopeHeal` row. Toast confirms the count.
- Form mutation uses direct property assignment on the reactive
(settings/testCases/keywords); the existing deep watchers emit
`update:content` to the parent, marking the file unsaved.
Shared utility — `frontend/src/utils/healToggle.ts`:
- `HEAL_VARIANTS` map derived from `backend/src/recording/heal/
library.py`. Adding a new `@keyword("Heal …")` there means one
line here.
- `applyHealToForm(form, mode)` walks the parsed form (NOT raw
text), so `Run Keyword Click selector` is structurally
safe: `Click` is an argument, not a step keyword, and never
gets rewritten.
- Preserves array identity for unchanged sub-trees so Vue
reactivity only flags the parts that actually changed.
Design invariants honoured (per CLAUDE.md "SH-2 opt-in contract"):
1. Explicit per-keyword opt-in — no Browser-library monkey-patch.
2. Source rewrite, never runtime mutation — every Heal swap is a
one-line `.robot` diff the user sees in git.
3. Custom-configured `Library RoboScopeHeal <args>` rows are
preserved across both directions; bare auto-added rows are
removed when the last Heal* keyword leaves the file.
4. User-defined `Heal Login` / `Heal Foo` keywords (anything not
in HEAL_VARIANTS) are NEVER touched — disable doesn't rename
them to bare and enable doesn't try to promote them.
Tests:
- 42 unit tests in `healToggle.spec.ts` cover: the 13-keyword
map (pinned set + Object.freeze), trim/case-sensitivity of
lookups, library-row add/remove idempotence (incl. preserve-
configured), `applyHealToForm` enable/disable across test
cases + user keywords, the keyword-as-argument edge case,
immutability of input forms, array-identity preservation.
- Frontend vitest: 608/608 (+42 new). vue-tsc + production
build clean.
Stories:
- `_bmad-output/implementation-artifacts/heal-1-flow-editor-per-step-toggle.md`
- `_bmad-output/implementation-artifacts/heal-2-explorer-suite-level-toggle.md`
Out of scope: repo-wide bulk toggle, library-arg config UI,
custom-Heal-keyword registry. The `no-heal` Robot tag remains
the per-test runtime escape hatch layered on top of these source
choices.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the silent-recorder gap reported in user testing: after
clicking *Start Recording*, the Live view sat on a `Connecting…`
badge for the entire Chromium boot — sometimes seconds, sometimes
forever — with no feedback about what was happening, and no
recovery affordance when the spawn failed silently (missing
\$DISPLAY on a Linux server, Playwright wheel not initialised,
blocked port).
Backend
The in-process command FIFO (`v2_command_queue`) now carries a
heterogeneous `RecordedCommand | LifecycleEvent` stream. A new
`LifecycleEvent` carries one of four phases plus a wall-clock
timestamp captured at enqueue, and an optional human-readable
message for the crash variants. `iterate_events()` yields both
types in insertion order; `iterate_commands()` is preserved as a
filter-only wrapper for any W.2-era caller that doesn't care about
lifecycle.
`v2_recorder_task` emits at four well-defined boundaries:
- `browser_starting` — immediately before `pw.chromium.launch(...)`.
- `browser_ready` — after `context.new_page()` + the optional
initial `goto`, i.e., the point the user can click and see
events arrive.
- `browser_crashed` (in-loop) — from `_on_disconnect` when the
browser disconnects without a user-initiated stop. The crash
message names the disconnect channel ("browser disconnected
unexpectedly"); for outer wrapper crashes we surface the
exception string instead so a $DISPLAY problem reaches the user.
- `browser_restarting` — emitted by the HTTP endpoint just before
it signals the current task down. The fresh task then emits its
own `browser_starting` → `browser_ready`.
The wrapper `run_v2_recorder_session` moved its
`_mark_status(COMPLETED)` call OUT of the inner `_recorder_loop`
and into its own `finally` block. That lets it discriminate three
exit paths:
1. clean stop → mark COMPLETED + finalize_session + tear_down.
2. crash (exception bubbled out) → push `browser_crashed`
lifecycle, mark FAILED, finalize_session + tear_down.
3. stop-for-restart (`_restart_pending` set) → SKIP all three —
the new task reuses the same queue + DB row.
The new endpoint `POST /recordings/sessions/{id}/restart-browser`
ties it together. 404 / 403 / 409 / 501 enforced (status not
RECORDING returns 409, recorder-disabled env returns 501, owner
check 403, missing session 404). On the happy path it pushes a
`browser_restarting` lifecycle event onto the SSE channel so the
pill flips immediately, signals `_restart_pending` + stop, polls
up to 5s for the wrapper to vacate `_stop_signals`, then
dispatches a fresh `run_v2_recorder_session` with the same
target_url stashed on the session row. Two recovery branches:
when no task is in `_stop_signals` (process restart leftover) we
dispatch a fresh task directly; when `signal_restart_v2` races to
False between checks we fall through to the same recovery path.
SSE generator updated to multiplex: `event: command` for
`RecordedCommand`, `event: lifecycle` with `{ phase, ts, message }`
JSON for `LifecycleEvent`, the existing `event: end` sentinel
unchanged.
Frontend
`RecordingLiveView.vue` replaces the 4-state `streamState`
(`connecting | live | done | error`) with a richer `phase` enum
driven by the backend lifecycle events: `connecting →
browser_starting → browser_ready → (browser_restarting → ...) →
done | error | browser_crashed`. A new EventSource listener for
`lifecycle` events routes payloads through the same
`_transitionTo()` state machine the SSE transport events use.
Two new template areas:
- A *phase card* next to the heading carrying the pill, a live
`mm:ss` uptime label that ticks each second once
`browser_ready` fires (reset on `browser_restarting`), and a
"Restart browser" button enabled in `browser_ready` /
`browser_crashed` and disabled during the transient phases
(where the backend would 409 anyway).
- A red crash banner with the backend's error message under
`browser_crashed`.
A `command`-first fallback flips `connecting / browser_starting`
to `browser_ready` if a command somehow arrives before a
lifecycle event (late attach, restart-mid-stream). It deliberately
does NOT promote `browser_crashed → browser_ready` — a stray
late command from a buffered binding is recorded but doesn't lie
about the phase.
API client `restartV2Browser(sessionId)` added in
`recording-v2.api.ts`. i18n complete in EN/DE/FR/ES under
`recorder.live.lifecycle.*` + `restartBrowser` + `crashTitle`.
Tests
Backend (14 new in `test_v2_recorder_vis.py`):
- Queue: `enqueue_lifecycle` returns False without `register`,
heterogeneous `iterate_events` preserves insertion order across
commands + lifecycle, `iterate_commands` filter-only backcompat,
`LifecycleEvent.ts` defaults to wall-clock at construction.
- Wrapper: a raise inside the inner loop pushes `browser_crashed`
onto the queue and tears down (queue gone from the registry).
- `signal_restart_v2` returns False without an active task,
returns True + sets the event + marks `_restart_pending` when
a task is registered.
- Restart endpoint: 404 / 403 / 409 / 501 paths, two happy-path
branches (dispatch when no task active, signal-and-dispatch
when active). Both happy paths assert `dispatch_task` was
invoked with `(run_v2_recorder_session, session_id, target_url)`.
- SSE multiplex: a producer-thread that interleaves
lifecycle/command/lifecycle/command and finalises drains
through the streaming response, the response body contains
exactly 2× `event: lifecycle\n` and 2× `event: command\n`
followed by the `event: end` sentinel.
Existing `TestHappyPathDoesNotMarkFailed` updated for the
wrapper's new ownership of `_mark_status(COMPLETED)` — the
test's intent (verify FAILED branch doesn't fire on a clean exit)
is preserved by asserting NO FAILED entries among the observed
calls instead of asserting zero calls overall.
Frontend (17 new in `RecordingLiveView.lifecycle.spec.ts`):
mirrors the state machine inline (SFC setup script isn't
importable) and exercises every transition entry point: each of
the four lifecycle phases, full restart cycle resets uptime
timestamp, crash clears the message slot when ready arrives
again, the command-first fallback for `connecting` and
`browser_starting`, the no-regression assertion for already-ready
/ crashed / restarting, plus the `mm:ss` formatter edge cases
(null → null, pad zeros, clamp negative deltas).
Frontend totals: 625/625 (was 608, +17). Backend `tests/recording/`:
all four files green. `vue-tsc --noEmit` clean,
`npm run build` clean.
Out of scope (deferred to follow-ups):
- Showing the Chromium PID — Playwright Python's stable API does
not expose it; uptime + phase carry the user-visible signal.
- Auto-restart on crash — restart stays user-initiated so flaky-
spawn root causes aren't masked.
- A repo-wide stuck-recordings sweep — the existing launcher
reset button covers that.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ontent
The HEAL-2 suite-level toggle mutates the reactive form and relies
on the deep watchers on `form.settings/testCases/keywords` to emit
`update:content` for the parent. Those watchers are guarded by
`inFormEditingTab` (true on `visual` / `flow`, FALSE on `code`) so
free typing in CodeMirror isn't echoed back during a code edit.
That guard turned the toolbar toggle into a no-op on the Code tab:
form was rewritten, but CodeMirror still showed the old text, the
parent never saw the unsaved-content event, and the next switch to
visual/flow would call `parseRobotToForm(internalCode)` which
overwrites the rewritten form with the stale code — the toggle's
effect was lost.
Fix: in `onHealSuiteToggle`, when `activeTab === 'code'` we
explicitly:
- serialize the form back to source (`serializeFormToRobot()`),
- update `internalCode.value` so a later tab switch sees the new
code,
- dispatch a CodeMirror change so the user actually sees the
keyword rename happen in the visible buffer,
- emit `update:content` so the parent marks the file dirty.
Visual + Flow paths are unchanged. They already work correctly: the
deep watchers fire, emit `update:content`, the parent feeds the new
content back via `props.content`, and the watcher on `props.content`
calls `parseRobotToForm(newContent)` so the form is reloaded from
the freshly serialized source. The Visual `v-for` and Flow's
`robotFormToFlow` computed re-render against the new form references
right away — keyword inputs already show `Heal Click` after the
toggle, no extra fix needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A step keyword named `Click` in a `.robot` file does NOT necessarily
mean Browser-library's Click — it could just as easily be a custom
user keyword the file defines under `*** Keywords ***` with the same
name. Until now, both the HEAL-1 per-step checkbox and the HEAL-2
suite toolbar button surfaced any time those heal-able names appeared
in a step. Toggling on such a file would rename the user's own
keyword to `Heal Click`, leaving it unresolvable at runtime —
breaking the test rather than healing it.
Add a real library-import gate on top of the heal-able-keywords check:
the toggle is now visible only when the file actually imports
`Library Browser` (or one of the pip-name variants:
`robotframework-browser`, `robotframework_browser`,
`robotframework-browser-batteries`, `robotframework_browser_batteries`)
OR has already opted in by importing `Library RoboScopeHeal`. The
matcher is case-insensitive and tolerates the `Library Browser
auto_closing_level=KEEP` form (args don't affect detection).
New helpers in `frontend/src/utils/healToggle.ts`:
- `hasBrowserLibraryImport(form)` — true when any settings row
matches one of the canonical Browser names (regex matches the
five spellings above; `key === 'Library'` enforced so a
`Documentation` row that mentions "Browser" doesn't trip it).
- `hasRoboScopeHealImport(form)` — true when any settings row
imports `RoboScopeHeal` (bare or with config args). Once the
user has explicitly opted into the heal contract the toggle
stays available even without an explicit Browser library row.
Plumbed into both:
- `RobotEditor.vue::healSuiteState` returns `'hidden'` when
neither import is present.
- `FlowEditor.vue::selectedStepHealMode` returns `'hidden'`
under the same condition. Means a clicked `Click` step in a
file that doesn't import Browser will not even reveal the
checkbox.
Tests: 11 new unit tests in `healToggle.spec.ts` covering the
canonical name, the four pip-name variants, case-insensitivity,
the args-aware match, the no-Library-row negative path, the
documentation-mentions-Browser negative path, the empty form. Plus
the RoboScopeHeal counterpart with the bare + configured-args
paths. Frontend totals: 636/636 (was 625, +11).
Stories: both HEAL-1 and HEAL-2 specs updated with the new
edge-case row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… dict, not an object
User-reported: a recording against heise.de's cookie banner saved
`Click iframe[src*="cmp.heise.de"] >>> text="Zustimmen"` as the
first selector — the active candidate at index 0 — even though
`text="Zustimmen"` matched THREE elements in the iframe (two buttons
plus a paragraph). Run 72 reproduced strict-mode failure at replay:
"strict mode violation: locator(...) resolved to 3 elements".
The Story S.3 selector verifier is supposed to catch exactly this:
during the live recording it runs each candidate through Playwright's
`.locator(value).evaluate_all(...)` and flips `verified_unique=True`
on the ones that resolve to a single visible+actionable element.
Single-actionable candidates sort to the front; multi-match ones are
either disambiguated to `>> nth=0` or kept with a heavy penalty.
The sidecar told the story: every recorded candidate had
`verified_unique: false` and no MatchInfo. The verifier wasn't
running.
Root cause
Playwright's `BrowserContext.expose_binding` invokes its callback
with `source = dict(context=ctx, page=page, frame=frame)` — a plain
`dict`, see `playwright/_impl/_page.py:1539`. Our `on_capture`
extracted the frame via attribute access:
frame = getattr(source, "frame", None) or getattr(source, "page", None)
`getattr` on a `dict` always returns the default — dicts have keys,
not attributes — so `frame` was ALWAYS None. The downstream
`_verify_command_candidates` early-returns on `frame_or_page is
None`, leaving the unsorted, unverified candidate list intact.
The existing unit tests for the verifier wire-up pass `_FakePage`
directly as the second argument, bypassing the source-extraction
path entirely. So this regression has been live since the wire-up
landed (`d339584 fix(recorder): wire verify_candidates into the v2
capture handler`) without a single test catching it.
Fix
New helper `_resolve_frame_target(source)` does the right thing:
- `isinstance(source, dict)` → use `source.get("frame")` then
fall back to `source.get("page")`. Covers the Playwright path.
- `source is None` → return `None`. The recorder's own
`_on_new_page` emission calls `on_capture(None, payload)`; this
keeps the synthetic switch-page event harmless.
- Otherwise → keep the `getattr(...)` fallback so older test
stubs that expose `.frame` / `.page` as attributes still work
without churn.
`on_capture` now calls `_resolve_frame_target(source)` in place of
the buggy getattr chain.
Tests
Six new tests in `TestResolveFrameTarget`:
- dict with frame → returns frame
- dict without frame (top page) → falls back to page
- dict without either → None
- source=None → None (synthetic Switch Page event)
- object with .frame attribute → returns frame (legacy stubs)
- object with only .page attribute → returns page
All 13 wire-up tests green (was 7). Existing _FakePage-style tests
unchanged.
Manual verification path — `_recorder_loop` will now pass a real
`Frame` into the verifier on iframe captures; the next recording
against heise.de will land with `verified_unique=True` on the
unique candidates and `text="Zustimmen" >> nth=0` (the
disambiguated form) at slot 0 instead of the multi-match raw
`text="Zustimmen"`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al-browser E2E coverage
User reported: after the previous frame-resolution fix
(`a06f296`), recording on heise.de's cookie banner produced
`# RBSCOPE: dropped Click — no selector captured` — every
candidate dropped. Worse than before the fix, because at least
the unverified candidates used to leak through.
The right diagnosis was: `a06f296` made `_resolve_frame_target`
return the actual iframe Frame (not None) so the verifier
finally got to RUN. But during a click that dismissed the iframe
(or navigated the page), the verifier ran AGAINST a frame that
detached mid-flight. Every `loc.evaluate_all(...)` call raised,
the `except Exception` arm coerced the result to `MatchInfo(0, 0,
0)`, and `verify_candidates` dropped every 0-match candidate.
Net effect: the recorder turned every click-that-affects-the-DOM
into a no-selector capture.
Plus the user pointedly asked: *do you have real-browser E2E
tests for this scenario?* No. The unit tests for the verifier
wire-up use `_FakePage` stubs that can't reproduce a navigation
race. Until this commit, the regression class had zero E2E
coverage.
Verifier contract change
`verify_candidates` (selector_verification.py) and the helper
`_resolve` (v2_recorder_task.py) now distinguish three outcomes:
1. **`MatchInfo(t>0, ...)`** — verification ran cleanly, the
selector matched something live. Classify gold / visible-only
/ hidden / multi-match and rank as before.
2. **`MatchInfo(total=0, ...)`** — verification ran, selector
resolved to nothing. Drop (existing behavior — the selector
is truly stale).
3. **`None` returned, OR locator_factory raised** — verification
COULD NOT RUN (frame detached after a navigation-triggering
click, page closed mid-flight, transient browser-side error).
Preserve the candidate at the TAIL of the result list with
`verified_unique=False` intact. Synthesis produced it for a
reason and the user pointed at SOMETHING when the click was
captured. Sorted within the tail by quality_score desc so the
best static-heuristic candidate is the first unverified one.
Concrete bound on the round-trip: `_resolve` now wraps
`loc.evaluate_all(...)` in `asyncio.wait_for(..., timeout=1.0)`.
Without that, a click on a page that's mid-navigation can leave
the JS round-trip hanging until Playwright's default timeout —
unacceptable for an interactive recorder where the user might
click ten things in a second, and a hung verify pegs the entire
loop. `TimeoutError` is also an `Exception`, so it routes through
the same preserve-as-unverified branch as any other failure.
Tests
Unit tests in `test_selector_verification.py`:
- `test_factory_exception_preserves_candidate_as_unverified`
replaces `..._is_dropped_not_kept_unverified` — same setup
(factory raises), new assertion (candidate preserved at tail,
unverified).
- `test_factory_none_return_preserves_candidate_as_unverified`
pins the explicit-None contract.
- `test_factory_unverifiable_tail_sorted_after_verified`
drives a mixed list (two clean matches + two raise) and
asserts the verified ones lead, the unverifiable tail
follows, both halves sorted by quality_score desc.
Unit tests in `test_v2_recorder_verify_wire.py`:
- `test_locator_factory_raise_preserves_candidate_as_unverified`
replaces the old "invalid syntax dropped" assertion — clean
matches still lead, the boom candidate ends up at the tail
unverified.
- `test_locator_factory_raise_preserves_candidate_when_no_other_match`
pins the heise.de case directly: all three candidates raise
(iframe detached), all three preserved, sorted by qs desc.
Real-browser E2E in `test_v2_recorder_e2e.py` (the part the
user pointedly asked about):
- `test_click_that_navigates_preserves_selector_candidates`
drives the recorder against `recorder_multipage_a.html`,
clicks the `[data-testid="goto-page-b"]` link (full-page
navigation), waits for Page B's heading, then asserts the
captured Click has AT LEAST ONE selector candidate.
- `test_click_inside_iframe_that_removes_itself_preserves_selectors`
uses the new `recorder_iframe_banner.html` (parent) +
`recorder_iframe_inner.html` (loaded via `src=`, not srcdoc
— srcdoc iframes don't expose a stable `frame_url` for
RECORDER-FRAMES tagging). The inner button posts a message
that the parent uses to `.remove()` the iframe — exact
Sourcepoint shape. Asserts the captured Click has
`frame_url` set AND at least one selector candidate.
Both E2E tests reproduce the exact failure mode the user reported
and pass after the fix (3/3 in series, 8.72 s). Non-integration
recording suite: 59/59 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidecar limitation user noticed: the legacy `frame_url: str | None`
field stored the iframe's URL but nothing else. The emitter rebuilt
the cross-frame locator at serialise time with ONE hardcoded
strategy — `iframe[src*="<host>"]` — which:
- broke whenever the host was not unique on the page (multiple
CMP iframes from the same vendor — exactly the Sourcepoint
multi-banner case);
- gave the picker no alternative iframe selector to switch to
(the user could only re-pick the INNER selector);
- had no support for nested iframes.
Schema (`backend/src/recording/selector_schema.py`)
New `FrameDescriptor { url, selector_candidates }` model. New
`frame_chain: list[FrameDescriptor]` field on `RecordedCommand`,
default empty for top-frame events + backward compatible with
pre-FRAMES-2 sidecars. Order: index 0 is outermost iframe, last
entry is the iframe whose document the event came from.
Recorder (`backend/src/recording/v2_recorder_task.py`)
- `_capture_frame_chain(frame_or_page)` walks parent ancestry via
Playwright's `frame.parent_frame` + `frame.frame_element()`
(CDP-level call, works cross-origin), bounded by
`asyncio.wait_for(..., timeout=1.0)` so a detaching iframe
fails fast rather than hanging the recorder.
- `_synthesise_iframe_candidates(element_handle, parent_frame,
frame_url)` builds ranked candidates per rung using the iframe
element's attributes:
qs 95 — `iframe[data-testid="..."]`
qs 90 — `iframe#<id>`
qs 85 — `iframe[name="..."]`
qs 75 — `iframe[src="<exact>"]`
qs 65 — `iframe[src*="<host>"]` (legacy fallback strategy)
qs 40 — `iframe.<first-class>` (last resort)
Each candidate is verified by counting matches against the
PARENT frame (where the iframe element lives, not where its
content does). 0-match candidates dropped; 1-match candidates
flagged `verified_unique=True`; multi-match preserved with the
flag False. Output sorted (verified DESC, qs DESC).
- Wire-up in `_verify_command_candidates`: alongside the inner-
selector verification, capture the chain in the same async
pass so the iframe still exists when we ask (best chance —
the banner-removes-itself flow only takes effect after the
user's original click handler completes).
Emitter (`backend/src/recording/robot_emit.py`)
- `_iframe_chain_locator(cmd)` composes `outer >>> inner` for
cross-frame replay, picking each rung's
`selector_candidates[0]` (pre-sorted, so the testid/id/name
strategy wins when available). Rung with empty candidates
falls back to the legacy `iframe[src*="<host>"]` derived from
that rung's url — partial chains still produce a valid
composite locator.
- `_emit_command` prefers the chain when present, falls back to
the legacy URL-only path otherwise. Old sidecars keep working
without a re-record.
E2E coverage (the user explicitly asked us to pin in real-browser
tests, including verifying the SIDECAR FILE contents)
`backend/tests/fixtures/recorder_iframe_stable.html` — new
fixture: parent page with a single iframe that has id, testid,
name AND src; the iframe content button does NOT remove the
iframe on click (vs. the Sourcepoint flow). Lets the recorder's
chain synthesis run cleanly.
`test_iframe_click_records_frame_chain_with_id_candidate_in_sidecar`:
- Records a click in the stable iframe.
- Asserts the captured `RecordedCommand` carries a populated
`frame_chain` (the structural fix).
- Asserts the first rung's best candidate is testid/id/name
strategy (NOT the qs-65 src-host fallback).
- Asserts the FIRST candidate's `verified_unique=True`.
- Asserts the emitted .robot line uses the high-quality iframe
locator (`iframe[data-testid="..."] >>>`) and NOT
`iframe[src*="127.0.0.1"]`.
- Observed emit: `Click iframe[data-testid="consent-banner"]
>>> [data-testid="agree-btn"]`
`test_iframe_click_when_iframe_detaches_falls_back_to_url_strategy`:
- Records a click in the self-removing iframe (Sourcepoint
shape — `recorder_iframe_banner.html`).
- Asserts `frame_url` is preserved (the a06f296 regression chain).
- Asserts inner `selector_candidates` is non-empty (the 9db5c3b
preserve-on-exception chain).
- Asserts the emitted line still has an `iframe ... >>> inner`
wrapper — i.e., the legacy URL-host fallback fires when the
chain capture didn't make it in time.
- Observed emit: `Click iframe[src*="127.0.0.1:58004"] >>>
[data-testid="agree-btn"]`
Plus four unit tests in `test_robot_emit.py::TestFrameChainEmit`:
chain-wins-over-legacy, empty-chain-fallback, rung-without-
candidates-uses-url-fallback, nested-iframe-composition.
Totals: 62/62 non-integration + 5/5 E2E green.
Backward compat
Pre-FRAMES-2 sidecars (with `frame_url` only) keep working: the
emitter checks `frame_chain` first, falls back to the URL-derived
single-strategy path. No re-record needed for existing recordings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two unrelated wins bundled because they share user feedback from the same heise.de + debug session. (1) Recorder emit-time defensive disambiguation User report: re-recording on heise.de still produced an unrunnable `.robot` for the cookie-banner "Zustimmen" click. Investigation showed the chain was right and the inner candidates were preserved (the 9db5c3b + 8311b32 fixes did their jobs), but the candidate landing in slot 0 was `text="Zustimmen"` with `verified_unique=False` — the verifier couldn't run because the iframe detached mid-flight, so the verifier's `_with_nth_match` disambiguation never fired. Browser library's strict mode rejects `text="Zustimmen"` at replay because three elements match (the button plus two paragraphs). Fix in `_render_selector`: when the active candidate is `verified_unique=False` AND uses a multi-match-prone strategy (text, role, aria, generic css without an id), wrap the value with `>> nth=0` at emit time. The wrap is suppressed when the selector already carries `nth=`, `>>>`, or `>>` chains so we never double-wrap a verifier-disambiguated candidate or interfere with hand-edited chains. Six unit tests in `test_robot_emit.py::TestDefensiveDisambiguation` pin the contract: unverified text → wrapped, verified text → bare, css with `#` → no wrap (id selectors are unique enough), xpath → never wrapped (synthesis writes explicit-enough xpath), already- disambiguated → not double-wrapped, unverified pure-class CSS → wrapped. Net effect on the existing heise.de sidecar (cmd[1] — the Zustimmen click, frame_chain emptied by the iframe detach): Before: `Click iframe[src*="cmp.heise.de"] >>> text="Zustimmen"` After: `Click iframe[src*="cmp.heise.de"] >>> text="Zustimmen" >> nth=0` The `nth=0` is cosmetic noise but makes the recording RUNNABLE under Browser library strict mode without forcing the user to edit the file by hand. The 5 E2E recording tests stay green because their fixtures all produce testid-strategy candidates which are not in the risky set. Existing test suite: 76/76 unit recording-tests green (defensive wrap doesn't fire on verified candidates which is what the tests assert against), 5/5 E2E green. (2) DebugPanel layout polish User report: - "the stack-trace file paths are too long and bleed into the other panel" - "make the current line number much, much more prominent" - "show the textual content of the current line somewhere" Fixes in `frontend/src/components/debug/DebugPanel.vue`: - Grid layout: `grid-template-columns: 220px minmax(0, 1fr)` plus `min-width: 0` on both children. Without the explicit `minmax(0, ...)` on the second column, CSS grid lets content in the first column overflow rather than truncating — the classic "long file path pushes the call-stack column off- screen and overlaps Variables on the right" symptom. - Stack-file: `overflow: hidden`, `text-overflow: ellipsis`, `white-space: nowrap`, plus the template now renders `basename(frame.file)` (filename only, no path) with the FULL path on the wrapping span's `title` attr — hover surfaces the rest. Same `title` + truncation applied to the stack-name span. - New prominent paused-line callout below the header: shows `LINE` label + line number at 32px / 700 weight in the primary brand colour, with filename basename + keyword name as secondary metadata. Rendered only when paused + line is set; takes the question "where am I in the test?" from "scan the header pill carefully" to "the giant number tells you immediately." Filename truncates on overflow same as the stack-file lines so a long path never breaks layout. i18n: new `debug.panel.lineLabel` key in EN/DE/FR/ES. The "textual content of the current line" piece is deferred — would need either a backend pause-event extension to push the line text alongside the line number, or a frontend `/explorer/.../file` fetch sliced at the line. Both are real follow-ups, scope-managed out of this commit. Frontend totals: 636/636 vitest green, vue-tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User report: re-recorded heise.de "Zustimmen" cookie banner, replay still doesn't work. The selector `iframe#sp_message_iframe_…` they saw in an earlier sidecar doesn't land in the new one. They asked flat out: do I have a successful run? Honest answer: no — my E2E used synthetic fixtures with a stable iframe (`recorder_iframe_stable.html`). For the self-removing iframe shape (`recorder_iframe_banner.html`, modelled on Sourcepoint), the test asserted only that inner selectors were preserved and the emitted line had SOME iframe wrapper — it never asserted the chain itself was populated. So the regression class "iframe id-based locator falls back to URL host when iframe detaches before chain capture" was uncovered. Root cause `_capture_frame_chain` (8311b32) ran AFTER inner-selector verification inside `_verify_command_candidates`. By then the self-removing CMP banner had already detached its iframe; the follow-up `frame.frame_element()` call raised, the chain rung landed with 0 candidates, and the emitter fell back to the legacy `iframe[src*="<host>"]` URL-derivation. Fragile when the host is not unique on the page and impossible to override from the picker because there's only one candidate. Fix architecture — beat the race by capturing the iframe identity BEFORE the click happens JS (top-frame, `capture_script.py`) The top frame's capture script — and only the top frame — now enumerates `document.querySelectorAll("iframe")` on DOMContentLoaded and posts an `iframe_register` event per iframe. Each event carries the iframe's id / name / data-testid / src / classes PLUS a per-candidate uniqueness count computed synchronously via `document.querySelectorAll(candidate).length` in the same tick. Synthesis + verification both happen in JS, in one synchronous slice, before any user click can detach the iframe. (An earlier draft used a MutationObserver to catch late- loaded iframes but that broke iframe click capture — apparently the high-frequency mutation callbacks flooded the binding queue fast enough that subsequent click events were silently dropped. A `load` listener for `<iframe>` elements is the lighter alternative if late-load coverage is needed; this commit ships the initial-scan-only variant since it suffices for static Sourcepoint-style banners.) Backend (`v2_recorder_task.py`) - Per-session `iframe_inventory: dict[str, dict]` indexed by both `iframe_src` and `iframe_contentUrl` (when same-origin and JS can read it). Builds up as register events arrive. - `on_capture` recognises `kind: "iframe_register"` and routes to the registry — returns early, no `RecordedCommand` produced, no slot in the user-visible command stream. - `_capture_frame_chain` now consults the inventory FIRST, falling back to the live `frame_element()` only when no registry hit (cross-origin iframes whose contentUrl JS couldn't read, ad iframes that registered too late, etc.). - New `_candidates_from_inventory(inventory, frame_url)`: exact-match key lookup first, then substring fallback against stored `iframe_src` (handles internal iframe navigations where `frame_url` no longer matches the initial `src` attribute). Maps the JS-side `count` field to `verified_unique = count == 1`, sorts by (verified DESC, qs DESC), drops 0-match candidates. E2E proof The Sourcepoint-shape test now asserts the chain IS populated even though the iframe self-removes within milliseconds of the click: test_iframe_click_when_iframe_detaches_falls_back_to_url_strategy [frames-2-detach] frame_chain candidates (post-detach): ['iframe[data-testid="banner-frame"]', 'iframe#banner-frame', 'iframe[src="recorder_iframe_inner.html"]'] [frames-2-detach] emitted line: Click iframe[data-testid="banner-frame"] >>> [data-testid="agree-btn"] # rbs:5075fc5030b0 That's the same fixture that previously emitted `iframe[src*="127.0.0.1:58004"] >>> …` (host-only legacy fallback). The id/testid-based locator is the same shape that the user's heise.de sidecar HAD ONE recording prior, before the Zustimmen click detached the iframe — and now retains it through the detach. Test totals: 5/5 E2E + 68/68 non-integration recording suite green. Caveats / known limitations carried forward - Cross-origin iframes whose contentWindow.location.href JS can't read register only their initial `src` attribute as the lookup key. If the iframe navigates internally to a different URL before the click, the lookup may miss and we fall back to live `frame_element()` (which may still work if the iframe is still attached at that point). - No nested-iframe support beyond what Playwright's `frame.parent_frame` walk already covers — the JS inventory sees the IMMEDIATE iframes of the top frame; iframes inside iframes get their own register events when the inner top runs the script (recursively, via `add_init_script`'s per-document install), but the chain composition across multiple registers is not yet tested. Real-world consent banners are 1-deep, so this is acceptable for v1. - MutationObserver path deliberately omitted (see comment in capture_script.py). Late-injected iframes that load AFTER DOMContentLoaded won't be inventoried until a workaround lands. The `load` event on iframe elements (capture phase) is the planned next step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… live view Two user-reported bugs, both from the same heise.de Zustimmen recording session: (1) "klappt immer noch nicht" — late-loaded CMP iframes missed The Sourcepoint consent banner on heise.de injects its iframe via its async-loaded SDK ~600-1500ms AFTER `DOMContentLoaded`. The previous proactive-inventory commit (3004b27) scanned exactly once at `DOMContentLoaded`, which is BEFORE the iframe exists. The inventory was empty when the user clicked, the live-API fallback in `_capture_frame_chain` hit a detached frame (the click had already triggered the banner's removal), and `frame_chain[0]` landed with zero candidates → emitter fell back to the legacy `iframe[src*="<host>"]` shape. Retry-scan added: `setTimeout(_registerIframesOnce, …)` at 100 / 300 / 700 / 1500 / 3000 / 5000 ms after init, plus a capture- phase `load` listener on the document so each iframe's own `load` event re-triggers the scan. Dedupe via `seenIframeKeys` (NEW — was missing in the minimal version) keys on `src|id|name|testid`, so re-scans are no-ops for already- registered iframes — no binding-queue flood, no risk of dropped click events (which a MutationObserver-based attempt caused earlier in the same session). New E2E fixture `recorder_iframe_late_load.html` mirrors the Sourcepoint shape exactly: the parent's inline `<script>` does `setTimeout(() => parent.appendChild(<iframe ...>), 600)`. New E2E `test_iframe_loaded_after_DOMContentLoaded_still_registered` records a click inside that iframe and asserts: - `frame_chain[0].selector_candidates` is NON-EMPTY (the retry-scan caught the late-loaded iframe before the click); - the emitted line uses `iframe#sp_message_iframe_1234567` or `iframe[data-testid="cmp-banner"]` or `iframe[name="consent"]`, NOT the legacy URL-host fallback. Observed emit on the new fixture: Click iframe[data-testid="cmp-banner"] >>> [data-testid="agree-btn"] Same shape will work on heise.de — the Sourcepoint iframe id is `sp_message_iframe_<message_id>` and gets caught the same way. (2) "im recorder bildschirm nicht der richtige selektor angezeigt" The Live recorder view's SelectorPicker shows only the active INNER candidate (e.g. `text="Zustimmen"`). The .robot the user saves has the composite form `iframe#sp_message_iframe_… >>> text="Zustimmen" >> nth=0` — which is what'll actually run on replay. Two different mental models on the same screen made it impossible for the user to tell whether their recording was going to work. New util `frontend/src/utils/effectiveSelector.ts` mirrors the Python emitter's `_emit_command` selector-composition logic exactly: - `renderSelector(cand)` — handles strategy prefixes (`xpath=`, `text=`) and defensive `>> nth=0` for unverified risky- strategy candidates (text / generic css / role / aria — same `_RISKY_UNVERIFIED_STRATEGIES` set as the Python side). - `iframeChainPrefix(cmd)` — composes `outer >>> inner` from `cmd.frame_chain`, rung-fallback to URL-host when a rung has no candidates. - `effectiveSelector(cmd)` — the full composite line. Wired into RecordingLiveView as a new `.robot:` preview row under each step in the live list. The picker still shows candidate alternatives in its dropdown (selection lives there); the preview shows what the user is actually going to save. 19 unit tests pin parity with the Python emitter, including: - all four strategy-prefix branches - the six defensive-disambiguation branches (mirrors Python's `TestDefensiveDisambiguation` 1:1) - the iframe-chain composition branches (mirrors `TestFrameChainEmit`) - the heise.de integration: chain + defensive disambiguation + multi-strategy candidate list → expected `iframe#sp_message_iframe_1454968 >>> text="Zustimmen" >> nth=0` - edge cases: no candidates → empty, active_candidate_index not always slot 0 i18n: new `recorder.live.effectiveTitle` tooltip key in EN/DE/FR/ES. Test totals - Backend recording E2E: 6 tests (added the late-load case to the existing 5) — all green. - Frontend vitest: 655/655 (added 19 in effectiveSelector.spec.ts on top of the 636 baseline). 1 pre-existing unhandled-rejection error from main is the DebugPanel.spec.ts noise fixed on release-0.10.0. - `vue-tsc --noEmit` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-reported screenshot: the Flow Editor node label correctly shows the composite `iframe#sp_message_iframe_14549 >>> text="Zustimmen" >> nth=0`, but the detail-panel SelectorPicker on the right still displayed only the raw inner `text="Zustimmen"` and its dropdown alternatives were likewise inner-only — so the user couldn't tell which alternative would actually run cleanly under Browser library's strict mode after the iframe wrapper + defensive disambiguation are applied. Two-line fix in SelectorPicker.vue: import the helper added in the previous commit and bind both the active-value `<code>` and the per-row `<code>` in the dropdown to `effectiveSelectorForCandidate(cmd, c)` instead of `c.value`. The raw value lives on the `title=` attribute for hover (and the inline-edit input still operates on the raw — editing the verbatim value is the right contract; viewing the composite is the right default display). New util export: `effectiveSelectorForCandidate(cmd, cand)`. `effectiveSelector(cmd)` is now a thin wrapper that picks the active candidate from `cmd` and delegates — same behaviour for existing callers (RecordingLiveView), new behaviour for callers that want "what would happen if I picked THIS row". Test totals: 19 unit tests on effectiveSelector unchanged (parity with Python emitter pinned via the same scenarios). Full vitest: 655/655 green, vue-tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…scopeheal
The heal-library lived twice: in-tree under
`backend/src/recording/heal/` (~1300 LOC across 4 modules) AND
as the already-extracted sibling repo `roboscope-rfheal/`. The two
had drifted ~340 lines since extraction (April 28); the in-tree
copy carried the more recent RECORDER-FRAMES iframe-wrap guard +
type-narrowing fixes that the extracted v0.1.0 didn't have. Single
source of truth resolved by moving everyone onto the sibling repo
under its new PyPI distribution name
`robotframework-roboscopeheal` (Robot Framework community
convention — matches `robotframework-browser`,
`robotframework-seleniumlibrary`, …). Python import unchanged
(`from RoboScopeHeal import …`); only the `pip install` name
moves.
Sibling repo (`/Users/rat/git/mateo2/roboscope-rfheal`) work —
not in this commit but coordinated with it:
- pyproject `name` renamed to `robotframework-roboscopeheal`
- version bumped to 0.2.0
- URLs repointed to `viadee/robotframework-roboscopeheal`
- 4 source files + 7 test files re-synced from in-tree,
imports rewritten `src.recording.heal.X` → `RoboScopeHeal.X`
- `defusedxml>=0.7.1` added to runtime deps (was implicitly
pulled in via the monorepo backend, now made explicit)
- CHANGELOG with the RECORDER-FRAMES + rename entries
- 100/100 tests green in the sibling venv
Backend changes (this commit):
- `backend/pyproject.toml`: declare
`robotframework-roboscopeheal>=0.2` as a runtime dep.
Until v0.2 lands on PyPI, devs install editable via
`uv pip install -e ../roboscope-rfheal`
(documented inline next to the dep line). The dep spec
is forward-compatible — when PyPI ships, the same
constraint resolves to the published distribution.
- `backend/src/execution/router.py` + `backend/src/stats/
service.py`: import `parse_heal_audit` from
`RoboScopeHeal.heal_report` instead of
`src.recording.heal.heal_report`. 3 imports total.
- `backend/tests/recording/test_iframe_locator_contract.py`:
same rewrite (2 imports).
- Delete `backend/src/recording/heal/` entirely — 4 source
modules + __init__.py, ~1290 LOC removed.
- Delete `backend/tests/recording/heal/` entirely — 7 test
files + __init__.py, the same tests now live in the
sibling repo where they're closer to the code under test
AND get exercised by the rfheal CI rather than only when
someone runs the RoboScope monorepo's pytest suite.
Migration verification — 164 tests green across the boundary:
- `tests/recording/test_robot_emit.py` : 36 ✓
- `tests/recording/test_v2_recorder_verify_wire.py`: 14 ✓
- `tests/recording/test_iframe_locator_contract.py`: 9 ✓ (new
import path)
- `tests/execution/test_heal_report_endpoint.py` : 5 ✓ (the
one consumer of `parse_heal_audit` outside the heal layer)
- `roboscope-rfheal/tests/` :100 ✓
(entire heal test surface, in the sibling repo's venv)
Caveats while PyPI is not yet published:
- Fresh clones need `uv pip install -e ../roboscope-rfheal`
before `make dev` works. README + CLAUDE.md updates
deferred until the PyPI flip; the inline comment in
pyproject.toml is the immediate signpost.
- The 4 RoboScopeHeal modules now live in TWO places in the
user's working tree (`roboscope/backend/` ↛ ↻
`roboscope-rfheal/`). The in-tree copy is GONE; the
sibling is the single source. Future hotfixes against the
heal layer go to the sibling repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ut PyPI
Story HEAL-VENDORED. The previous extraction
(`refactor(heal): move heal library out` — `f15f9d1`) made the
heal library a runtime dependency on
`robotframework-roboscopeheal>=0.2`, which is fine for the
author's local dev loop (sibling-repo + uv.sources) but fails for
EVERYONE else:
- Fresh clone of RoboScope without sibling rfheal repo →
`uv sync` fails with "path not found".
- Standalone offline ZIP (`roboscope_offline_<platform>.zip`)
pre-downloads wheels via `pip download`; the heal dep isn't
on PyPI yet so it gets silently skipped and the install
bundle ships without the heal library.
- Docker build hits the same wall.
This commit ships the heal library WITH every RoboScope release,
zero external dependencies. PyPI publication is now an optional
optimization, not a blocker for shipping.
Vendor directory
`backend/vendor/robotframework-roboscopeheal/` carries the full
source tree from the upstream `roboscope-rfheal` repo at v0.2.1:
- 4 source modules (candidate_finder, fingerprint, heal_report,
library) + __init__.py
- pyproject.toml declaring name=`robotframework-roboscopeheal`,
version=`0.2.1`, with `robotframework-browser` now an
OPTIONAL `[browser]` extra (the previous hard dep forced
pip to resolve Playwright + node + Chromium at install time,
bloating the offline bundle by ~80 MB for an import-time
check that never happens — the heal lib delegates to Browser
via `BuiltIn().run_keyword()` at TEST-RUN time, not at
module-import time, so install + import succeed without
Browser present).
- LICENSE + NOTICE (Apache-2.0 provenance preserved per
license requirements).
- CHANGELOG with the 0.2.0 + 0.2.1 history.
Tests deliberately NOT vendored. They live in the upstream
rfheal repo where they're tied to that repo's CI; vendoring
would double the test surface in the RoboScope monorepo for
zero new signal — RoboScope's own e2e Recorder tests + the
heal-report-endpoint test already exercise the integration
boundary.
backend/pyproject.toml
- `[tool.uv.sources]` re-pointed from `../roboscope-rfheal`
(sibling, dev-only) to `vendor/robotframework-roboscopeheal`
(committed, travels with every clone).
- Inline comments rewritten to reflect the vendored model
instead of the manual-sibling-install model. PyPI flip is now
purely a decision about which install source to prefer.
scripts/sync-roboscopeheal.sh
- New helper. Copies sibling `../roboscope-rfheal/` into the
vendor tree after showing the user a brief recursive diff +
asking for confirmation. Excludes the sibling-only directories
(tests/, uv.lock, .gitignore, dev caches). `ROBOSCOPE_SYNC_ASSUME_YES=1`
bypasses the prompt for scripted release-prep use.
scripts/build-mac-and-linux.sh
- New step in the wheel-collection block: builds the
`robotframework-roboscopeheal` wheel from `backend/vendor/...`
via `python -m build --wheel`, drops it next to the
pip-downloaded wheels in `$DIST/wheels`. `install.sh`'s
`pip install --no-index --find-links wheels/` matches it by
version automatically.
- Transient `pip install --upgrade build` to bootstrap the
build tool on hosts that don't have it (typical CI runners
don't).
- Falls through with a warning if the vendor directory is
missing — better than failing the entire release script,
since a hand-edited bundle could plausibly skip the heal
library.
backend/tests/test_vendored_rfheal_present.py — 11 tests
- Asserts the vendor directory exists.
- Asserts each of the 8 canonical files (4 modules + __init__
+ pyproject + LICENSE + NOTICE) is present.
- Asserts the vendored pyproject declares the canonical
distribution name (catches an upstream rename without a
matching vendor sync).
- Asserts `RoboScopeHeal.__version__` (installed) matches the
vendored `pyproject.toml` version (catches the "stale wheel
in venv, fresh source on disk" scenario).
Verification matrix
- 11/11 vendor-presence tests green.
- 70/70 heal-tangential tests green (test_robot_emit,
test_v2_recorder_verify_wire, test_iframe_locator_contract,
test_vendored_rfheal_present).
- Sync script tested both ways: detects drift, no-ops on clean
state.
- Wheel build from vendor: `uv build --wheel` produces
`robotframework_roboscopeheal-0.2.1-py3-none-any.whl`
(32 KB).
- Clean-venv install of the wheel pulls only `defusedxml` +
`robotframework` + the package itself — `robotframework-browser`
correctly stays out of the install closure.
- Import works cleanly:
`>>> from RoboScopeHeal import RoboScopeHeal, parse_heal_audit`
PyPI flip path (when access lands)
1. Push the upstream `roboscope-rfheal` repo to
`github.com/viadee/robotframework-roboscopeheal`.
2. `cd ../roboscope-rfheal && uv build && twine upload …`.
3. In THIS repo: remove `[tool.uv.sources]` from
`backend/pyproject.toml` — the dep then resolves from PyPI.
4. The vendor directory CAN stay for offline-install fallback
OR be removed; orthogonal decision. The build-script wheel-
build step short-circuits with a warning if vendor is missing.
Sibling repo (`roboscope-rfheal/`) sees a coordinated 0.2.1 commit
(`8a6806c`) with the same `robotframework-browser` → extra move.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t venv
Story HEAL-VENDORED phase 2. The first phase (`436ea54`) made the
heal library available to the RoboScope SERVER (dashboards, the
heal-report endpoint) by vendoring the source. That solves
perspective 1 — RoboScope's own code can `import RoboScopeHeal`.
But the more important consumer is perspective 2: the user's
test cases. A user writes
*** Settings ***
Library Browser
Library RoboScopeHeal
*** Test Cases ***
Login
Heal Click [data-testid="submit"]
…and expects "Save & Run" to just work. Today it doesn't: the
heal library lives in the BACKEND venv (vendor-editable-install),
but the user's .robot test runs in a PROJECT venv that RoboScope
created via `environments/tasks.py::create_venv` — and that venv
only had `robotframework` pre-installed. `pip install
robotframework-roboscopeheal` from the package-management UI
would have failed (the name doesn't resolve on PyPI yet).
Fix
`environments/tasks.py::create_venv` now also installs the heal
library from the vendored source tree as a sibling step to the
default `robotframework` install. The user gets it for free on
every fresh venv — no UI roundtrip, no PyPI dependency.
- New `_vendored_heal_path()` helper: resolves
`backend/vendor/robotframework-roboscopeheal/` relative to
`tasks.py`'s location, returns an absolute Path. `uv pip
install` happily accepts a local source-tree path as a
positional package spec.
- New `_install_vendored_heal_into_venv(venv_path, env_id)`
helper: runs `pip_install_cmd(venv_path, vendor_path)` via
subprocess, NON-FATAL on failure (heal is opt-in test
ergonomics, not a hard requirement — a build error here
shouldn't tank the entire venv creation). Three failure
branches:
* vendor dir missing → log WARNING, return. Watchdog test
in `test_vendored_rfheal_present.py` catches this in CI.
* pip exit non-zero → log WARNING with rc + stderr tail.
User can install manually from the package-management UI
later.
* subprocess raises (uv missing on PATH, OS error) →
log WARNING + exc_info, return.
- Wired into `create_venv` immediately after the
`robotframework` install. Heal is now part of "the canonical
starter set" for every project.
Why install only at create-time (not refresh): users who
explicitly remove heal from their venv via the package-management
UI shouldn't have it silently re-added on every backend restart.
The create-time-only contract makes the auto-install
user-overridable: install heal once, then either keep it or
uninstall it for good.
Verification
5 unit tests in `test_vendored_heal_auto_install.py`:
- `_vendored_heal_path()` resolves to an absolute path whose
layout matches the vendor tree (`name == "robotframework-
roboscopeheal"`, parent == `vendor`) AND the resolved
pyproject.toml actually exists on disk.
- missing-vendor-dir → no subprocess fired, WARNING logged.
- happy path → vendor path string appears in pip argv, INFO
log emitted.
- pip exit non-zero → WARNING with rc=1 logged, no raise.
- subprocess raises (FileNotFoundError("uv: not found")) →
WARNING with "raised" logged, no exception escapes.
Integration smoke (executed manually, not in CI to avoid the
~20s venv-build cost):
$ python -c "<inline smoke harness>"
venv: /var/folders/.../rs-heal-smoke-llzid0eg/.venv
venv create rc=0
rf install rc=0
import test: v 0.2.1 stderr=
End-to-end: fresh tempdir venv → robotframework installed → heal
auto-installed from vendor → `python -c "import RoboScopeHeal;
print(RoboScopeHeal.__version__)"` outputs "0.2.1" cleanly.
Now closes perspective 2 — anyone who downloads RoboScope
(source clone, offline ZIP, Docker) can use
`Library RoboScopeHeal` + `Heal *` keywords in their .robot
tests on day one, without PyPI access and without manually
fishing the vendored wheel out of the bundle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tection
Three intertwined fixes that came out of a heise.de Sourcepoint
banner debugging session — chasing why a recorded `Click iframe#x
>>> text="Zustimmen" >> nth=0` was producing wrong results after
swapping selectors in the FlowEditor's right detail panel.
1. Swap writes the composite, not the raw inner
`applySelectorSwap` previously copied only `candidate.value`
into `step.args[0]`. The SelectorPicker UI advertises
`effectiveSelectorForCandidate(cmd, c)` — iframe-chain prefix +
`renderSelector` inner + defensive `>> nth=0` — as "what gets
saved", but the swap path dropped everything except the raw
inner. Result: a swap on an iframe-recorded command silently
stripped the iframe wrap, the .robot fired a top-frame click,
and the user saw "the selector you picked from the menu isn't
what landed in the file".
Now `applySelectorSwap` (and the new sibling
`composeEffectiveSelector` used by Edit / Add) compose via
`effectiveSelectorForCandidate` — same function the picker
display uses. What-you-see is now what-gets-saved by
construction, not by parallel implementations that can drift.
2. isCustomSelectorValue: symmetric composite-match, not regex
strip
The "eigener Wert, nicht aus der Aufzeichnung" badge AND the
`window.confirm` swap-overwrite dialog used a shape-specific
strip regex `^iframe\[[^\]]+\] >>>` to peel decorations off
`step.args[0]` before comparing against `candidate.value`.
That regex only matched ATTRIBUTE-CSS iframe candidates
(`iframe[src*="…"]`); the heise.de sidecar uses an id-based
candidate (`iframe#sp_message_iframe_1454968`) which slipped
through. Every legitimate swap fired the confirm prompt and
badged itself as custom.
Worse: after fixing the iframe regex to `lastIndexOf(' >>> ')`,
the SAME asymmetry hit `xpath=` / `text=` prefixes that
`renderSelector` adds on the write side. A swap to an xpath
candidate landed `iframe#x >>> xpath=//button[…]` in args[0]
but `candidate.value` is `//button[…]` (no `xpath=`) → strip
still didn't match → false-custom again.
New approach is a 3-step hybrid:
1. Raw exact match against `c.value` (legacy / bare values).
2. STRICT composite match: `effectiveSelectorForCandidate(
cmd, c) === current` for every candidate. Symmetric with
the write path — anything any swap COULD have written is
recognised as non-custom by construction. Picks up the
`xpath=` / `text=` prefixes, the iframe-chain (any
strategy / nesting depth), and the defensive nth=0
uniformly because they ALL live behind that one function.
3. Loose fallback: strip `lastIndexOf(' >>> ')` + nth=N
from current, compare against `c.value` and
`renderSelector(c)`. Only catches legacy sidecars whose
`frame_chain` was lost but the .robot still carries an
iframe wrap — strict step 2 would miss those.
3. `effective_override` field — user-supplied verbatim emit form
The composite is auto-built from synthesised iframe candidates
+ risky-strategy defensive nth. On heise.de the
synthesised iframe rung was `iframe#sp_message_iframe_1454968`
(session-specific message_id) — replay-stable only if Sourcepoint
hands back the same id, which it doesn't always. The user had
no way to substitute a hand-tuned chain locator
(`iframe[src*="cmp.heise.de"]`, host-substring) without
ditching the structured candidate.
New field on `SelectorCandidate`: `effective_override: str | None`.
When set non-empty, every layer — the FlowEditor composer, the
Python emitter (web `_emit_command` + desktop
`_emit_desktop_command`), the SelectorPicker's display, the
custom-detection — short-circuits to the verbatim string.
Strategy + value stay tied to quality classification (the
coloured quality dot still reflects the locator's stability),
they're just decoupled from the emit form.
UX in the SelectorPicker's ✏ Edit form and the "+ Add custom"
row: a third "Effektiv" input below strategy + value, prefilled
with the auto-composed form, live-synced with value/strategy
until the user types in it (then it decouples and becomes the
override). Orange-tinted border + "Override aktiv" badge + ↺
reset button when the typed form differs from auto. On commit,
if `effective === auto` the override is CLEARED (back to
recompose); else stored verbatim.
Pydantic schema gets the new field with default None; legacy
JSON sidecars round-trip cleanly (test pinned). The four locale
files (en/de/fr/es) get six new i18n keys for the input label,
placeholder, tooltip, reset title, override-badge title, badge
text.
Verification
Frontend: 70 / 70 unit tests green incl.
- 3 new effectiveSelector override tests (verbatim short-circuit,
null / empty / whitespace fallbacks)
- 1 new applySelectorSwap override test (composite via override)
- 2 new SelectorPicker override tests (Edit + Add emit verbatim
override on `effective` payload field)
- 3 new isCustomSelectorValue regressions (id-based iframe shape,
xpath-after-swap, multi-level chain)
- vue-tsc clean
Backend: 56 / 56 recording tests green incl.
- 4 new TestEffectiveOverride cases (skip wrap+nth, empty
fallback, JSON round-trip, legacy-without-field load)
- Desktop emitter mirrors the same override contract
Pinned by test_robot_emit.py + test_selector_schema.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…able
Two unrelated stats-view bugs caught in the same session:
1. Success-rate chart x-axis labels were spread across the
FULL container width via `justify-content: space-between`,
but the bars themselves stop at `.chart-bar { max-width: 20px }`
and don't fill the full width. So the bars sat left-stacked
with a chunk of whitespace on the right while the labels
ran 0 % → 100 % of the parent — first label aligned with bar 0
but the last label drifted right of the last bar by however
many pixels of whitespace the bars left unfilled.
Fix: render one x-axis slot per bar (same `display: flex; gap;
min-width: 4px; max-width: 20px` as `.chart-bar`), only
populate the slots whose bar index matches a chosen label
position. `chartXLabels` now carries `{ idx, text }` instead of
just `text`; `labelForBar(i)` looks up the text for slot `i`.
Each visible label sits directly under its bar by construction.
2. The Pass/Fail Trend table showed dates ascending (oldest at
the top) — opposite of every other "newest first" pattern in
the app and what users expect from an at-a-glance view.
Fix: `trendsDesc = computed(() => [...stats.trends].reverse())`
and v-for iterates that. Backend still returns ascending, which
keeps the SUCCESS-RATE chart's left-to-right chronological read
correct — only the table render flips.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kage UI
Implements the "Vendor-default + PyPI-upgrade" distribution model
chosen for the heal library. Heal is now first-class in the
Package Management UI: a "ships with RoboScope" badge marks the
entry, a one-click install resolves to the bundled vendor copy
today, and an explicit version pin will hit PyPI (once published)
as the upgrade path past whatever version RoboScope shipped with.
Backend
- `POPULAR_RF_LIBRARIES` (router.py) gains a heal entry with a
new `shipped_with_roboscope: True` flag. The frontend's existing
popular-package loop picks it up automatically; the flag is the
only data attribute that distinguishes shipped libraries from
ordinary popular ones.
- `_SHIPPED_VENDOR_PACKAGES` registry in `tasks.py` — central map
from PyPI distribution name to vendor directory under
`backend/vendor/`. Currently has one entry
(`robotframework-roboscopeheal`). Adding a library to this
registry is all that's needed for the install-resolution logic
to redirect "no-version install" requests to the on-disk source.
- `_shipped_vendor_path(name)` — case-insensitive lookup helper.
Returns the absolute vendor path if the name is registered AND
the directory exists on disk, else None. The "exists" check
protects against a stripped-tree release accidentally trying
to `uv pip install /path/that/does/not/exist` — caller falls
back to PyPI cleanly with a single WARN log.
- `install_package` (tasks.py) consults the registry before
building the pip argv. The rule:
request shape resolves to
------------------------------- ------------------
install("heal", version=None) vendor source path
install("heal", version="0.4") "heal==0.4" → PyPI
install("other", *) "other"/"other==X" → PyPI
Logs at INFO when the shipped-path resolution kicks in so the
reason for an unexpected source is traceable.
Frontend
- `EnvironmentsView.vue` template adds a `variant-badge shipped`
span next to the package name when `pkg.shipped_with_roboscope`
is truthy. Blue tint (`#dbeafe / #1e40af`) to distinguish from
the existing green "Recommended" and amber "Requires Node.js"
badges. The `popularPackages` ref type was extended with the
new optional field so vue-tsc accepts the prop.
- Hover tooltip on the badge explains the install behavior:
"klicking Installieren uses the bundled copy; pin a version to
fetch from PyPI". Translated EN/DE/FR/ES (two new keys each:
`environments.shippedWithRoboscope` + `shippedWithRoboscopeTitle`).
Tests
- `test_tasks.py::TestInstallPackage` — two new cases:
* `test_shipped_no_version_installs_from_vendor_path` — pins
that the pip argv carries the vendor path (containing
"vendor/robotframework-roboscopeheal") and NOT the bare
package name (which would trigger PyPI).
* `test_shipped_with_version_goes_to_pypi` — explicit version
bypasses the vendor; argv has "robotframework-roboscopeheal==0.4.0"
and zero vendor-path leakage.
- `test_vendored_heal_auto_install.py` — four new
`_shipped_vendor_path` cases: real package resolves, case-
insensitive lookup matches mixed/upper case, unknown packages
return None (no spurious vendor redirect for ordinary PyPI
installs), missing vendor dir → None + WARN.
- `test_tasks.py::TestCreateVenv::test_creates_venv_with_uv` —
updated expected subprocess count 2 → 3 (heal auto-install
third call), and assertion that the third argv carries the
vendor path. Pre-existing test failure left over from commit
0ee1e23 — fixed in passing as part of this work.
29 / 29 backend tests in the touched modules pass.
When PyPI happens
The flip is minimal: drop the entry from
`_SHIPPED_VENDOR_PACKAGES` (heal stops being treated as
shipped), drop the heal seed from `create_venv` (project venvs
no longer get the auto-install), and remove `[tool.uv.sources]`
from `backend/pyproject.toml` (backend resolves heal from PyPI).
The "ships with RoboScope" badge disappears naturally because
the flag is gone from the popular-libraries response.
CLAUDE.md / CHANGELOG documents the model so the flip is a
single small PR rather than an archaeology exercise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ offer rfbrowser init
When a subprocess-runner test fails with the classic Browser-
library error "browserType.launch: Executable doesn't exist at
.../chromium-NNNN/...", surface a yellow banner at the top of
the ReportDetailView with a one-click "Run rfbrowser init"
button. The button POSTs to the existing
`/api/v1/environments/{env_id}/rfbrowser-init` endpoint and
flips to a "started" state on success. No more "open terminal,
activate venv, run rfbrowser init, deal with conflicts" workflow
for the most common Browser library install pitfall.
Story HEAL-DIAG-1.
Architecture
- `backend/src/execution/diagnostics.py` (new) — pure detection
layer. `detect_report_diagnostic(run, results)` walks the
run's error_message plus every test_result.error_message,
pattern-matches against a registry of detectors, and returns
the first match's payload or None. Today only one detector
is registered (`_detect_playwright_browser_missing`); adding
more is a 3-line change (function + entry in `_DETECTORS` +
locale section + banner-renderer hook).
Detection regex matches BOTH the literal `browserType.launch:
Executable doesn't exist` line AND the trailing "Looks like
Playwright was just installed" ASCII box — Browser library
versions emit one, the other, or both depending on which
Playwright minor version is bundled, and the detector
shouldn't be silently brittle to phrasing tweaks.
Gating: only fires for `runner_type == SUBPROCESS` AND
non-null `environment_id`. The docker runner has its own
browser provisioning baked into the image (rfbrowser init
on the host wouldn't help the container); a subprocess run
without an environment has no venv to init against.
- `ReportDetailResponse.diagnostic: dict | None` — backend
serialises the detector result alongside the existing report
+ test_results. Discriminated union by `code`; today's only
value is `playwright_browser_missing`.
- `reports.router.get_report_detail` calls the detector at
response time. No new DB columns, no schema migration —
diagnostic is derived from already-stored data.
Frontend
- `RunDiagnosticBanner.vue` (new) — a small banner component
that takes a `RunDiagnostic` prop, renders the title /
description / action label from i18n keys keyed on the
diagnostic code (so a new code = locale section + done; no
component change required), and on click POSTs to the EXACT
endpoint the backend advertised. The frontend doesn't hard-
code `/environments/N/rfbrowser-init` — keeps the door open
for a future "out-of-disk-space" diagnostic that would POST
to a totally different endpoint.
Phase machine: idle → triggering → started OR failed. The
"started" state shows a ✓ badge and hides the button (no
auto-polling — the Environments view owns the install-
progress UI; the banner just kicks the job off). The
"failed" state surfaces the backend error detail so the user
can self-diagnose without devtools, and keeps the button
visible for a retry.
Re-entry guard: rapid double-clicks during the triggering
phase short-circuit to a single POST. Without it, two
overlapping init runs would fight over the same venv.
- `ReportDetailView.vue` — mounts the banner above the tabs
so it's visible regardless of which view (Summary / HTML
Report) the user is on.
- Type: `RunDiagnostic` in `domain.types.ts` mirrors the
backend payload. `ReportDetail.diagnostic?: RunDiagnostic | null`.
- i18n: 6 new keys per locale (EN/DE/FR/ES) under
`reports.diagnostic.*`: title + description + action label
for `playwright_browser_missing`, plus shared
startedBadge / startedMessage / failedMessage strings.
Tests
- Backend: `tests/execution/test_diagnostics.py` — 11 cases
covering the real heise.de error blob (pinned verbatim from
a live run so Playwright wording drift trips CI), run-level
vs test-level error placement, action payload shape (env_id
+ endpoint + method match what the banner expects), gating
conditions (docker runner → None, missing env_id → None,
None run → None), and regex robustness (case-insensitive,
each OR branch matches alone).
- Frontend: `tests/components/RunDiagnosticBanner.spec.ts` —
6 cases covering i18n title/description rendering in EN +
DE, endpoint-from-payload POST contract, started-phase
badge swap, failed-phase error detail surface, re-entry
guard (3 rapid clicks → exactly 1 POST).
- E2E: `e2e/tests/run-diagnostic-banner.spec.ts` — 2 cases.
Both mock `GET /reports/{id}` (avoids a 30 s real subprocess
run + a few hundred MB Playwright download) and intercept
the action endpoint to assert the EXACT URL was posted.
Positive case verifies banner visibility, label
localisation, button click → started badge, action endpoint
hit. Negative case verifies the banner stays absent when
the report has no diagnostic on it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t tests 47 new tests covering HEAL-1 per-step toggle mode classification and keyword rewrite roundtrip, and HEAL-2 suite-level state machine plus enable/disable/library-import wiring. Story files updated to `review`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
26 new Playwright tests covering:
- recorder-lifecycle.spec.ts: SSE auth guards (401/403/404), phase
pill transitions (browser_starting → browser_ready → browser_crashed),
restart-browser click, extension-transport non-stuck edge case.
- heal-toggle.spec.ts: HEAL-VENDORED out-of-box test (fresh venv auto-
seeds robotframework-roboscopeheal without PyPI, importable at RF
runtime); HEAL-1 per-step checkbox (hidden on Log, visible on Click,
keyword rewrite); HEAL-2 suite toggle (enable/disable all, revert).
- debug-session.spec.ts: API guards (422/404), debug button visibility
by run status, 424 prereq dialog cancel path, install+retry path,
409 dedup, no-output.xml fallback path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Heal *variant; hidden when no Browser/RoboScopeHeal import is present; rewrites the step keyword through the normal form path (unsaved-changes badge fires, no runtime mutation).robotframework-roboscopehealintobackend/vendor/somake devand offline release ZIPs work without a sibling repo or a PyPI publication. Offline build scripts updated; sync script added.frontend/src/utils/healToggle.ts(pure functions, fully tested):HEAL_VARIANTSmap,getHealVariant,getBaseKeyword, classifiers, library-import add/remove,applyHealToForm.Test plan
cd frontend && npx vitest run— 717 tests, 0 failurescd backend && .venv/bin/pytest tests/test_vendored_rfheal_present.py tests/environments/test_vendored_heal_auto_install.py -v— 20 tests, 0 failures.robotfile withLibrary Browserin the Flow Editor → select aClickstep → confirm the Self-Healing checkbox is visible and toggles the keyword namecd backend && uv syncon a fresh clone (no sibling rfheal repo) →python -c "import RoboScopeHeal"succeeds🤖 Generated with Claude Code