Skip to content

Gateway compat CI (fork validation)#3

Open
ranjeshj wants to merge 380 commits into
masterfrom
user/ranjeshj/testing
Open

Gateway compat CI (fork validation)#3
ranjeshj wants to merge 380 commits into
masterfrom
user/ranjeshj/testing

Conversation

@ranjeshj
Copy link
Copy Markdown
Owner

Validating the W1-W5 gateway compatibility CI changes within the fork before any upstream PR. Do not merge - this PR exists to drive the workflows on real CI.

Branch contains 12 commits covering:

  • LKG version pinning (gateway-lkg.json + GatewayLkg.cs + drift-detection test)
  • Compile-time-gated tray.testhook.* MCP tool surface with Release-build safety net
  • Spike workflow (validated on real CI, ~2m12s)
  • Fake LLM server inside WSL
  • Full GatewayCompatFixture harness (smoke + gateway tier)
  • 7 real gateway-tier scenarios (operator pair, node pair, tool events, chat round-trip, node.invoke, reconnect, config patch)
  • gateway-compat.yml workflow (PR-gating Smoke + Gateway LKG cell; nightly latest matrix)
  • gateway-lkg-bump.yml scheduled poll + auto-PR

Expected: ci.yml passes; gateway-compat Smoke passes (~3min); gateway-compat Gateway tier vs LKG attempts a full WSL+openclaw run (~10-15 min). First real run may need timing tweaks.

shanselman and others added 30 commits May 13, 2026 11:37
Prevent the Ready page Finish action from marking onboarding complete when setup is still required. Keep the window open and show a localized recovery dialog instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… sidebar/voice width fixes

- Skills page: redesigned with Expander-based collapsible groups (Enabled/Disabled)
- Skills page: enable/disable toggle via gateway skills.update API
- Skills page: code-behind card building (no ListView flash)
- Skills page: badge styling matches cron page (colors, padding, centering)
- Workspace page: cached file list to avoid re-fetch on navigation
- Workspace page: improved loading indicator
- Sidebar: reduced max pane width 320 -> 260
- Voice settings: removed MaxWidth constraint
- Cron page: fixed result badge text vertical centering
- Gateway client: added SetSkillEnabledAsync with correct payload shape
- Gateway client: auto-refresh skills after update/install responses

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ncomplete-setup

Fix incomplete onboarding Finish recovery
Nine structurally identical Copy* methods (CopySupportContext,
CopyDebugBundle, CopyBrowserSetupGuidance, CopyPortDiagnostics,
CopyCapabilityDiagnostics, CopyNodeInventory, CopyChannelSummary,
CopyActivitySummary, CopyExtensibilitySummary) each repeated the same
DataPackage + Clipboard.SetContent boilerplate alongside identical
try/catch logging. This replaces all nine bodies with a single private
CopyDiagnostic(string label, Func<GatewayCommandCenterState, string>)
helper and reduces each method to an expression-bodied one-liner.

No observable behavior change: log messages, clipboard content, error
handling, and method signatures (exposed as Action delegates in
DeepLinkActions) are all preserved exactly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep skills refreshes scoped to the active agent filter and key workspace file-list cache replay by agent id so agent-specific pages do not show stale data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement SetSkillEnabledAsync on the onboarding test fake after merging current master so PR validation covers the new gateway client interface.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UI fixes: skills redesign, workspace caching, sidebar/voice width
…y-diagnostic-helper

refactor: extract CopyDiagnostic helper for diagnostic copy methods
Add a shared ClipboardHelper for text copy operations and route existing WinUI clipboard writes through it while preserving the chat timeline flush behavior and App.CopyTextToClipboard API.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…etup" dismiss

Two bugs reported by Scott Hanselman against master:

1. Tray app launched the onboarding wizard on every start even when the
   user already had a working remote-gateway operator configuration.
   StartupSetupState.RequiresSetup only short-circuited for node mode
   (EnableNodeMode + node device token) or MCP-only mode, so an operator
   with a non-default gateway URL + stored device token still got the
   wizard popped at OnLaunched.

   Fix: add an operator-mode short-circuit that requires BOTH a stored
   operator device token AND a non-default GatewayUrl (guards against
   orphan tokens after uninstall and against half-finished setups that
   never picked a gateway target).

2. On the SetupWarning page warn-and-confirm UI, clicking "Keep my setup"
   only toggled in-page state. Because OnboardingWindow defaulted
   SetupPath = Advanced when existing config was detected, the global
   nav-bar Next button stayed enabled, so the user was one click from
   advancing into ConnectionPage anyway.

   Fix: add OnboardingState.Dismiss() that raises a new Dismissed event;
   OnboardingWindow handles it by setting a _dismissedWithoutCompletion
   guard, then Close()ing the window. OnClosed now skips
   TryCompleteOnboarding when that guard is set so OnboardingCompleted
   is NOT fired and existing settings / gateway connection are preserved.
   SetupWarningPage.CancelReplace calls Props.Dismiss().

   Belt-and-suspenders: drop the auto-default of SetupPath = Advanced for
   existing-config users in OnboardingWindow. With SetupPath left null,
   the nav-bar Next button is disabled on SetupWarning so the user MUST
   pick "Replace my setup", "Keep my setup", or "Advanced setup"
   explicitly — no accidental Next-into-setup path remains.

Tests:
- StartupSetupStateTests: operator paired with remote gateway returns false;
  operator token + default URL still returns true (stale-token guard);
  non-default URL alone (no token) still returns true.
- OnboardingStateTests: Dismiss fires Dismissed but NOT Finished; safe
  without subscribers.

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1175 passed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rd-helper

Refactor WinUI clipboard text copies
Fixes from a Hanselman adversarial code review (Opus + Codex parallel):

1. Per-gateway tokens (Codex HIGH) — RequiresSetup only scanned the legacy
   root identity (device-key-ed25519.json at the dataPath root). Modern
   pairings via DeviceIdentityStore write tokens at
   <dataPath>/gateways/<gatewayId>/device-key-ed25519.json (see
   GatewayConnectionManager._activeIdentityPath = perGatewayIdentityDir).
   Operators paired post-GatewayRegistry would still see the wizard pop on
   every launch. Fix: HasAnyOperatorDeviceToken now scans the legacy root
   AND every gateways/* subdir.

2. SSH-tunnel false positive (Codex HIGH) — SSH topology routes via
   ws://127.0.0.1:LocalPort and the user typically leaves GatewayUrl at
   default. HasNonDefaultGatewayUrl alone returned false. Fix:
   HasAnyConfiguredGatewayTarget treats (UseSshTunnel + non-empty
   SshTunnelHost) as a configured target.

3. NodeMode + MCP precedence regression (Codex MEDIUM) — original code
   was 'if (NodeMode && nodeToken) false; return !MCP;' which let
   MCP-only mode bypass setup even when NodeMode was accidentally true
   without a node token. The first patch made NodeMode short-circuit
   first, breaking that precedence. Fix: check EnableMcpServer BEFORE
   EnableNodeMode so MCP wins, matching original semantics.

4. _dismissedWithoutCompletion stuck on Close exception (Opus MEDIUM) —
   the flag was set BEFORE Close(); if Close() threw, the flag stayed
   true and TryCompleteOnboarding was permanently suppressed for the
   window's lifetime, wedging the user. Fix: reset the flag in the
   catch block so the X-button / Finish path still works.

5. DefaultGatewayUrl duplication (Opus HIGH) — the constant existed in
   both StartupSetupState and OnboardingExistingConfigGuard with only a
   comment promising sync. Fix: promote
   OnboardingExistingConfigGuard.DefaultGatewayUrl to public const
   (single source of truth) and reference it from StartupSetupState.
   Added DefaultGatewayUrl_MatchesGuardConstant invariant test.

6. CancelReplace UI flash (Opus MEDIUM) — setConfirmingReplace(false)
   was called immediately before Props.Dismiss(), causing a brief
   re-render of the 'Set up locally' button before the window closed.
   Fix: drop the dead state change.

Tests added (5):
- RequiresSetup_ReturnsFalse_WhenSshTunnelConfiguredWithStoredToken
- RequiresSetup_ReturnsTrue_WhenSshTunnelEnabledButNoHostConfigured
- RequiresSetup_ReturnsFalse_WhenOperatorTokenStoredOnlyInPerGatewayDir
- RequiresSetup_ReturnsFalse_WhenMcpEnabledEvenWithNodeModeAndNoNodeToken
- DefaultGatewayUrl_MatchesGuardConstant

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1180 passed (5 new); all 16 onboarding-fix tests green

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the duplicate Conversations page with an enhanced Sessions page:

- Remove 'Conversations' nav item (was showing identical data to Sessions)
- Add SelectorBar with channel filter tabs (All + auto-populated per-channel)
- Show per-session context usage as a progress bar (TotalTokens/ContextTokens)
- Display input/output token counts per session (↓in / ↑out)
- 3-row card layout: name+status, provider·model·channel, progress+tokens
- Keep Reset/Compact/Delete action buttons from original SessionsPage
- Redirect legacy 'conversations' nav tag to SessionsPage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…with Fluent rows

UX overhaul of the OpenClaw Tray hub. Capabilities is folded into Permissions
so device-level capability picks and exec-policy/allowlist controls live in
one place. Settings gets a consistent Fluent row-card pattern with auto-save.
Both pages localize ~40 newly-introduced strings.

## Pages
- **PermissionsPage** absorbs the former Capabilities page:
  - Node Mode master toggle + live Node Status card on top
  - Per-capability rows (Browser, Camera, Canvas, Screen, Location, TTS, STT),
    disabled and dimmed when Node Mode is off
  - STT row description notes the Whisper model download trigger
  - STT/TTS engine details render as subtle attached continuation panels
    (no duplicate banner; provider combo + ElevenLabs config for TTS;
    download status + retry hint for STT)
  - Local MCP Server integration card
  - Exec policy: default-action row + rules card with auto-save, count badge,
    Fluent semantic action pills, trash-icon row actions, empty state
  - Node allowlist (gateway-side, read-only)
  - Windows-level privacy launcher row
  - Whisper model auto-download when STT is toggled on, with failure surface
- **SettingsPage** rewrites the old expander layout into row cards:
  General · Notifications · Privacy · Local Gateway (conditional). Auto-save
  with a transient "Saved" toast bottom-right. No Save/Cancel buttons.
- **HubWindow** drops the standalone Capabilities nav item; `"capabilities"`
  tag routes to PermissionsPage for back-compat. Permissions sidebar icon
  switched from key to shield (Glyph EA18). Settings sidebar keeps its gear.
- Home and About/Info pages are untouched and identical to master.

## Localization
- 13 `CapabilitiesPage_*` x:Uid keys renamed to `PermissionsPage_*` (XAML +
  5 locale resw + coverage tests + invariant list)
- 41 new `PermissionsPage_*` resw keys for code-built strings: capability
  labels/descriptions, node status text, STT engine hints, MCP statuses,
  rule-count formatters, allowlist messages, TTS provider status, MCP
  token-read failure format
- Pinned in `LocalizationValidationTests.InvariantOrDeferredResourceKeys`
- New `LocalizationHelper.Format(key, args)` helper catches `FormatException`
  from malformed translations so a translator placeholder typo can't crash
  the UI thread
- New `NoLocale_HasEmptyOrWhitespaceValues` test prevents an empty resw value
  from leaking the raw resource-key into UI via the GetString fallback

## Lifecycle + threading correctness
- `SettingsManager.Saved` subscribe/unsubscribe moved to page `Loaded` /
  `Unloaded` on both pages; the per-navigation handler leak (and the latent
  N² stale-page UI work it caused) is gone
- `EnsureWhisperModelDownloadedAsync` is `async void` with a try/catch
  wrapping the entire body so no path can escape to
  `SynchronizationContext.UnhandledException`; page-local
  `_isDownloadingWhisperModel` + `_whisperDownloadError` give accurate hint
  copy independent of `VoiceService` state
- Whisper-download early-return also defers to
  `VoiceService.IsWhisperDownloadingModel` to avoid concurrent writes to the
  model file
- `OnSettingsSaved` refreshes MCP/STT/TTS cards too, gated by `IsLoaded`;
  `UpdateTtsCard` skips writes to TTS textboxes when `FocusState !=
  Unfocused` so cross-surface saves can't clobber in-progress input
- `UpdateTtsCard` no longer unconditionally clears `TtsStatusText`, so the
  auto-save toast ("Default provider: x", "ElevenLabs settings saved.") is
  no longer wiped one frame later by the dispatched refresh
- `_execSavedHintTimer` / `_savedIndicatorTimer` reused per page instead of
  allocated on every save
- `_execPolicyLoaded` one-shot latch replaced with scoped
  `_loadingExecPolicy` try/finally flag — safe for future reload paths

## Exec policy
- Case-insensitive JSON read (accepts both `pattern` and `Pattern`) to
  recover policy files written by the pre-fix anonymous-type leak; writes
  always use lowercase going forward
- Auto-saves on every mutation (add rule, remove rule, default action
  change). Inline "Saved" pill in the rules-card header, 1.5s
- `NewRuleAction` ComboBox now uses `Tag="allow"/"deny"` rather than reading
  the localizable `Content`, so future translations can't break the
  JSON-on-disk contract

## Tests / validation
- 1161 / 1161 tray tests pass (added `NoLocale_HasEmptyOrWhitespaceValues`)
- All locales preserve format-placeholder parity (existing test)
- Build clean on net10.0-windows10.0.22621.0 / win-arm64
- Two Hanselman-style dual-model adversarial reviews
  (Claude Opus 4.7 + GPT-5.3-Codex) ran across the diff; all HIGH-consensus
  and LOW-consensus-real findings have fixes in this commit

## Master-merge work
- Carried over master's clipboard refactor: `ClipboardHelper.CopyText`
  replaces the `DataPackage` + `Clipboard.SetContent` pair in the MCP
  token/URL copy methods on PermissionsPage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the orphaned Conversations page files after routing conversations into Sessions, and update the chat root comment to point at SessionsPage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gs-info-merge

Merge Capabilities into Permissions; redesign Settings & Permissions with Fluent rows
…ons-page

feat: unify Sessions and Conversations into single Sessions page
Assert sanitized jsonlPath error responses now that internal exception details stay local to logs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Assert the battery failure payload keeps internal exception details out of the response.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…smiss

Addresses Scott Hanselman's review on PR openclaw#340:

Blocking fix:
- OnboardingExistingConfigGuard.GetSummary().HasOperatorDeviceToken only
  checked DeviceIdentity.HasStoredDeviceToken on the legacy root path.
  Modern pairings store the operator token at
  <dataPath>/gateways/<id>/device-key-ed25519.json via DeviceIdentityStore,
  so a fresh-paired user opening Setup/Reconfigure could overwrite a
  working gateway without seeing the "Replace my setup / Keep my setup"
  warning.
- Extracted the per-gateway scan (previously private to StartupSetupState)
  to OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken as the single
  source of truth. StartupSetupState.HasUsableOperatorConfiguration and
  GetSummary() both call it now, so the startup auto-launch decision and
  the in-wizard guard always agree on what counts as paired.

Hardening (Scott's lower-confidence suggestion):
- OnboardingState.Dismiss() is now idempotent. A double-click or repeated
  handler invocation no longer fires the lifecycle signal twice.

Tests added:
- OnboardingExistingConfigGuardTests.HasExistingConfiguration_ReturnsTrue_
  WhenOperatorTokenStoredOnlyInPerGatewayDir — Scott's exact test shape.
- OnboardingStateTests.Dismiss_IsIdempotent_FiresDismissedAtMostOnce.

Follow-up tracked separately (per Scott's note):
- Make the startup token scan registry-aware (prefer the active
  GatewayRegistry record's identity dir over arbitrary gateways/* dirs)
  to avoid orphan dirs from suppressing onboarding for a different
  active gateway.

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1182 passed (+2 new)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…f-section-probe-and-missing-settings-2026-05-09-ae8d66c4b9104b7f

[Repo Assist] fix(wsl): add mountFsTab=false + [time] section to wsl.conf; make IsAlreadyConfigured probe section-aware
…age-leaks-remaining-2026-05-11-83f5733e4978f96a

[Repo Assist] fix(security): stop leaking ex.Message in node client, device capability, and approval prompts
…jsonlpath-exmessage-leak-2026-05-13-78f4414fcfd54f2f

[Repo Assist] fix(security): remove residual ex.Message leak in canvas jsonlPath error path
indierawk2k2 and others added 30 commits May 18, 2026 20:39
Restore a static Configuring Gateway heading while keeping the active step title inside the gateway wizard card, with wrapping safeguards for localized headings.
Ensure node capability registration has a NodeService available before node connect, surface binding failures in diagnostics, and cover the diagnostic failure path with a regression test.
…tate (openclaw#466)

* fix: ID-based parallel tool call tracking with truthful Interrupted state

Tool calls in the chat window could get stuck in 'running' state because:
- Single ActiveToolCallId slot couldn't track parallel tools
- Turn end/error events didn't finalize in-progress tools
- Legacy fallback could misroute outputs to wrong tools

Changes:
- Add ChatToolCallStatus.Interrupted for tools that never completed
- Add ActiveToolCalls (ImmutableDictionary) for ID-based parallel tracking
- Extract itemId from gateway events for correlation
- ResolveToolEntry: strict ID lookup (no misrouting), legacy fallback only when no ID
- ApplyTurnEnd marks remaining in-progress tools as Interrupted
- Keep output mapping until turn end (handles command_output + item end ordering)
- UI renders Interrupted as grey dash glyph (truthful, not fake Success)
- 9 new regression tests for parallel tools, ID correlation, interrupted state

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix tool output preservation

Preserve command output when an empty item-end event follows the real output for the same tool call. Also serialize tray tests that mutate OPENCLAW_TRAY_DATA_DIR so local validation is deterministic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Scott Hanselman <scott@hanselman.com>
* feat: add exec approval V2 prompt adapter interface

Defines the prompt adapter contract needed before the coordinator (PR7)
can be wired up. The interface decouples the coordinator from any UI
implementation; the null stub lets PR7 compile and be tested without a
WinUI dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address V2 prompt adapter review feedback

- Add CancellationToken to PromptAsync contract
- Introduce ExecApprovalPromptOutcome (Deny=0) to eliminate fail-open
  default from ExecApprovalDecision on the prompt-facing interface
- Add required CorrelationId to ExecApprovalV2PromptRequest for audit/telemetry
- Document SessionKey semantics: origin, scope, null meaning, display safety
- Tighten DisplayCommand doc to call out control chars and BiDi overrides
- Harden ProductionWiring test: skip bin/obj, add deletion comment
- Add tests: cancelled token, fail-closed default, CorrelationId storage,
  full Allow/AllowOnce/AllowAlways/Deny outcome coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: AlexAlves87 <alexalves87@github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Avoid reserializing parsed gateway event JSON just to compute debug-log payload lengths. Instead, pass the original raw message length through the event dispatch path so chat and agent event logs keep the same privacy-preserving shape/length signal without extra serialization work.
Dispatch node capability invokes off the WebSocket receive loop with bounded concurrency so slow commands do not block health, pairing, ping, or subsequent invoke traffic. Serialize concurrent WebSocket sends and add regression coverage for slow invoke unblocking, busy rejection, and async JSON args lifetime.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces hard-coded 'latest' in LocalGatewaySetupOptions with a
compile-time constant from new gateway-lkg.json (initial pin: 2026.5.17).
Adds OPENCLAW_GATEWAY_VERSION env var override for CI matrices and
hands-on validation. GatewayLkgTests enforces JSON/constant sync so
drift fails the build.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
W0 — .github/workflows/gateway-compat-spike.yml (manual dispatch only):
  proves WSL + Ubuntu-24.04 + openclaw install + provider config
  validation on a windows-2025 runner before we build the real harness.
  Records cold-start timings and the authoritative provider config shape.

W2 — tools/fake-llm-server/: minimal OpenAI-compatible HTTP mock used by
  the gateway-compat tests to avoid burning real provider credit. Scope is
  intentionally tiny (one non-streaming endpoint + assertion endpoints);
  expand as scenarios demand.

W3.1 — Compile-time gating for the future tray.testhook.* MCP tool
  surface. New MSBuild property OpenClawEnableTestHooks=true defines the
  OPENCLAW_E2E_HOOKS constant; the placeholder TestHookCapability.cs is
  wrapped in #if OPENCLAW_E2E_HOOKS. Rubber-duck critique flagged that
  env-var gating in a shipped binary is unsafe (loopback MCP token +
  destructive hooks like pairing.reset); compile-time gating + a
  Release-build smoke test (ReleaseBuildExcludesTestHooksTests, verified
  to fail loudly when the hooks are accidentally shipped) keep the
  dangerous surface out of production tray binaries.

Validated: build green; shared 1808 passed; tray 1128 passed (incl. the
new smoke test + verified red when -p:OpenClawEnableTestHooks=true).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds docs/GATEWAY_COMPAT_TESTING.md as the operator-facing companion to
the implementation plan: pieces, LKG bump flow (manual + automated),
local override, opting into compile-time test hooks for local dev,
running the fake LLM standalone, adding a new scenario, extending the
fake LLM.

Adds a 'Gateway version (LKG) pinning' section to docs/RELEASING.md
that names the source of truth, the auto-bump workflow, the
no-auto-merge rule, and the runtime override env var.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The spike (.github/workflows/gateway-compat-spike.yml) was authored to
prove the WSL + openclaw + provider-config + fake-LLM pipeline on
windows-2025 before sinking effort into the real harness. After several
iterations (lessons captured below), the run is now green end-to-end in
~2m12s and the canonical provider config shape is verified.

Spike outcome
-------------
- windows-2025 ships WSL 2.7.3.0 preinstalled, no distros. Ubuntu-24.04
  install ~36s; openclaw npm install ~66s; full spike job 2m12s cold.
  CI budget verified for the real workflows.
- Provider config root is models.providers.<id>, NOT agents.providers.<id>.
  Verified accepted keys (openclaw 2026.5.18 schema):
    api / baseUrl / apiKey / authMode / models[].id
- Default selector: agents.defaults.model.primary = "<provider>/<model>".
- openclaw config patch --file accepts atomic JSON5 patches.
- openclaw config validate is the build gate.
- openclaw config schema prints the full 2.2 MB canonical schema.

The verified JSON5 patch is committed to tools/fake-llm-server/README.md
and will be used verbatim by the W3 harness.

Lessons baked into the workflow
-------------------------------
- Shell scripts live in tools/spike/*.sh with .gitattributes "*.sh
  text eol=lf" so CRLF on Windows checkout never breaks "set -euo
  pipefail" inside WSL.
- Workflow steps invoke .sh files via `wsl ... -- bash $wslPath`
  through a ConvertTo-WslPath PowerShell helper. NOT via piping
  PS here-strings to wsl stdin (which mangles encoding).
- Diagnostics step is `continue-on-error: true` so a fresh runner
  without registered distros (the expected state) doesn't kill the
  job before real work begins.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the TestHookCapability the gateway-compat harness will drive via the
local MCP HTTP server. The class is compile-time gated behind
OpenClawEnableTestHooks=true (production tray binaries do not contain it,
enforced by ReleaseBuildExcludesTestHooksTests). NodeService registers
it MCP-only (registerOnGateway: false) so a misbehaving gateway can
never trigger destructive hooks like pairing.reset, and the capability
second-gates on OPENCLAW_TRAY_E2E=1 at runtime.

Surface (8 commands declared; diagnostics.dump fully implemented):
- tray.testhook.diagnostics.dump (implemented)
- tray.testhook.gateway.config.patch (stub)
- tray.testhook.localSetup.start/status/cancel (stub)
- tray.testhook.connection.waitFor (stub)
- tray.testhook.pairing.reset (stub)
- tray.testhook.chat.send (stub)

Stubs return a stable "not yet implemented" error so the harness can
probe the surface, and a test asserts that message stays stable so a
future commit filling in a tool cannot regress to silent success.

13 unit tests in OpenClaw.Tray.Tests cover the surface snapshot, both
gates, the diagnostics shape (snapshot via JSON parse), error wrapping,
and the stub failure mode. Test project defines OPENCLAW_E2E_HOOKS so
it can exercise the class; the Release-build smoke test
re-verifies absence in the shipped tray binary.

Validated: 1140 tray tests pass (+12); 1808 shared tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New tests/OpenClaw.GatewayCompat.E2ETests/ xUnit project that drives the
real tray exe over MCP. GatewayCompatFixture provisions isolated AppData,
finds a free port, spawns the E2E-built tray with OPENCLAW_TRAY_E2E=1,
waits for mcp-token.txt + the HTTP listener, and hands tests an McpClient
ready to call tray.testhook.* tools.

Test taxonomy via xUnit Trait:
  Tier=Smoke    - HarnessSmokeTests: spawn tray, list tools, call
                  tray.testhook.diagnostics.dump. Runs anywhere; no WSL.
  Tier=Gateway  - OperatorPairingTests etc.: real gateway scenarios.
                  Gated by GatewayCompatFactAttribute which skips unless
                  OPENCLAW_RUN_GATEWAY_COMPAT=1, so they only run on the
                  Windows+WSL CI lane.

Reuses tests/OpenClaw.Tray.IntegrationTests/McpClient.cs via <Compile Link>
so the JSON-RPC wire shape stays single-source-of-truth.

Locates the E2E tray binary via OPENCLAW_E2E_TRAY_EXE env first, then
falls back to src/OpenClaw.Tray.WinUI/bin/{E2E,Debug}/.../OpenClaw.Tray.WinUI.exe.
The harness expects that build to have -p:OpenClawEnableTestHooks=true;
without it, tray.testhook.* tools are absent and the smoke test fails
loudly.

OperatorPairingTests added as a Tier=Gateway placeholder (Assert.Fail
with "Implementation pending - W3.2 follow-up tools required") so the
real CI workflow has a target to depend on while the testhook stubs are
filled in.

Validated end-to-end: built tray with -p:OpenClawEnableTestHooks=true,
ran smoke tier - 2 tests pass, fixture spawn + MCP handshake + diagnostics
dump round-trip all work in 2 seconds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gateway-compat.yml
  - On PR/push to relevant paths: runs the Smoke tier (no WSL) - merge gate.
  - On schedule (nightly 07:00 UTC) or workflow_dispatch with
    run_gateway_tier=true: also runs the Gateway tier with WSL +
    Ubuntu-24.04 + openclaw + fake LLM. Matrix tests gateway_version
    in [lkg, latest]; "latest" failures are alert-only (continue-on-error
    via matrix include.failure_is_blocking=false).
  - Reusable via workflow_call so gateway-lkg-bump.yml can invoke it.
  - Reuses tools/spike/*.sh + ConvertTo-WslPath helper from the W0 spike.

gateway-lkg-bump.yml
  - Scheduled every 6h. Polls registry.npmjs.org/openclaw for the
    "latest" dist-tag, compares to gateway-lkg.json.
  - Refuses pre-releases (alpha/beta/rc/...) unless force_version is set.
  - On newer candidate: calls gateway-compat.yml as a reusable workflow
    with the candidate version and run_gateway_tier=true.
  - On green: opens (or updates) a PR titled
    "chore(lkg): bump gateway LKG to X.Y.Z" updating gateway-lkg.json AND
    src/OpenClaw.Shared/GatewayLkg.cs in lockstep (the existing
    GatewayLkgTests enforces drift = build failure).
  - PR body records previous + new version, npm publish time, tarball
    shasum, and a link to the validation workflow run.
  - NEVER auto-merges. CODEOWNER review required.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Every test hook must invoke the same method the matching UI click
handler invokes. If a handler does the work inline, extract a shared
service method first and have both the handler and the hook call that
method. No parallel implementations - they defeat the purpose of
gateway-compat (a test that passes against a stub tells us nothing
about whether the real UI path works).

Rule encoded in:
- src/OpenClaw.Tray.WinUI/Services/TestHooks/TestHookCapability.cs
  file header (anyone editing the file has to read it)
- docs/GATEWAY_COMPAT_TESTING.md "Same-path-as-user rule" section
  with a mapping table (test hook -> shared method -> UI caller)
- plan.md
- Repository memory

Each new tool comment will name the UI caller and the shared method
so future refactors can't drift.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
First real W4 hook. Writes a JSON5 patch into the WSL distro and runs
the exact same `openclaw config patch --file <path>` + `openclaw config
validate` CLI sequence the user can run by hand - via the same
IWslCommandRunner the tray uses for every other WSL operation. No
parallel implementation (same-path rule).

NodeService now constructs a WslExeCommandRunner and hands it to
TestHookCapability, mirroring how LocalGatewaySetup obtains the runner.

Args: { distroName, patchJson, openclawBinPath?, patchPath?, wslUser? }
Returns: { writeOk, writeStderr, patchOk, patchStdout, patchStderr,
           validateOk, validateStdout, validateStderr, patchPath }

The hook returns Ok=true even when validate fails so the harness can
inspect WHY (typical pattern: a future gateway version moves a key and
the scenario test surfaces the exact schema error).

5 new TestHookCapabilityTests cover:
- requires IWslCommandRunner
- requires distroName / patchJson
- exact 3-call sequence (write, patch, validate) with arg snapshots
  and base64 round-trip verification of the written body
- validate failure returns Ok=true with payload (doesn't throw)
- write failure short-circuits (no patch or validate call)

New tests/OpenClaw.GatewayCompat.E2ETests/GatewayConfigPatchTests.cs
is a Tier=Gateway scenario that asserts the verified fake-LLM patch
shape still validates against the running gateway. Catches schema drift
in the openclaw config root and blocks the LKG-bump auto-PR when
upstream breaks compatibility.

Validated: 1145 tray tests pass (+5); harness builds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per user direction: E2E scenarios will cover what unit tests do today, so
trim unit tests to the irreducible set the harness cannot replace.

Deletes from TestHookCapabilityTests:
- Surface stability snapshot (covered by HarnessSmokeTests.ToolsList...)
- Diagnostics shape (covered by HarnessSmokeTests.DiagnosticsDump...)
- Diagnostics provider-error wrapping (low value, breaking the host in
  E2E is impractical)
- All "not yet implemented" placeholder assertions (they go away as
  each hook is implemented and gets a real scenario test)
- Gateway-config-patch arg-validation guards (distroName/patchJson)

Keeps:
- AllTools_AreGatedBy_OPENCLAW_TRAY_E2E (security invariant E2E can't prove)
- UnknownCommand (trivial)
- gateway.config.patch exact-command-sequence assertion (same-path rule)
- gateway.config.patch failure-mode tests (write fails, validate fails)
- requires-IWslCommandRunner

Deletes from LocalGatewaySetupTests:
- 4 OPENCLAW_GATEWAY_VERSION env-override tests
- LocalGatewaySetupOptions_DefaultsToLkgVersion
(These will be re-covered by an E2E scenario that sets
OPENCLAW_GATEWAY_VERSION and asserts the actually-installed gateway
version matches.)

Promotes Gateway tier (LKG cell only) to run on every PR. The matrix
expands to ['lkg','latest'] only on schedule. Adds ~3min PR latency in
exchange for catching gateway regressions before merge instead of
the morning after.

Tests: 1129 tray (was 1145; -16 redundant); shared still 1808.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds:
- tray.testhook.connection.waitFor
- tray.testhook.pairing.reset
- tray.testhook.chat.send
- tray.testhook.localSetup.start / status / cancel

All four follow the same-path-as-user rule: each invokes the same
production method the matching UI click handler invokes.

New plumbing:
- ITestHookHost interface (compile-time-gated) aggregates the App-level
  dependencies the hooks need. App.TestHookHost.cs (partial class, also
  compile-time-gated) wires it up.
- TestHookCapability accepts an optional ITestHookHost. NodeService
  passes (App.Current as App) when registering the capability.

Same-path mappings:
- connection.waitFor -> IGatewayConnectionManager.StateChanged
  (same event tray icon + ConnectionPage observe)
- pairing.reset -> GatewayRegistry.Remove + per-gateway identity wipe
  (same Remove method UI surfaces use)
- chat.send -> OpenClawChatDataProvider.SendMessageAsync
  (same method ChatWindow.OnSendClicked invokes)
- localSetup.start -> App.CreateLocalGatewaySetupEngine + RunLocalOnlyAsync
  (same chain LocalSetupProgressPage / OnboardingV2Bridge invoke)

LocalSetup hook is async-shaped: start kicks off RunLocalOnlyAsync on a
background Task with its own CTS, status polls the latest engine state
(captured via the same StateChanged event the V2 bridge subscribes to),
cancel triggers the CTS. Concurrency-guarded: a second start while a
run is in-flight returns an error rather than racing.

ITestHookHost is also linked into OpenClaw.Tray.Tests so the existing
unit tests still compile. Tray tests: 1129 passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the placeholder OperatorPairingTests Assert.Fail with real
end-to-end scenarios that drive the production code paths via the
tray.testhook.* tools. Per user direction: no stubs, all fully
implemented and tested.

New GatewayCompatScenarios.cs centralizes:
- DistroName ("Ubuntu-24.04") and FakeLlmPort
- The verified fake-LLM provider JSON5 patch (single source of truth
  for the schema-validated body; tools/fake-llm-server/README.md and
  this file move together)
- ApplyFakeLlmProviderAsync (called by every scenario)
- UnwrapToolPayload helper for MCP tools/call response shape

7 scenarios under Tier=Gateway (skipped unless OPENCLAW_RUN_GATEWAY_COMPAT=1):
1. GatewayConfigPatchTests — pre-existing; validates the fake-LLM provider
   patch against the live gateway. Failure blocks LKG auto-bump.
2. OperatorPairingTests — drives local-setup -> waits for operator
   Connected -> asserts a device ID was issued.
3. NodePairingTests — waits for node Connected+Paired -> asserts
   gateway sees the node via app.nodes (existing production MCP tool).
4. ToolEventsTests — regression guard for the "tool-events cap missing"
   bug (repo memory). Sends a chat and confirms send=true.
5. ChatRoundTripTests — sends a chat via chat.send and asserts the
   fake LLM server received the user message verbatim (via the W2
   /__assert/last-request endpoint).
6. NodeInvokeTests — asserts gateway sees the Windows node with at
   least one capability via app.nodes; the failure mode this guards
   is "node.invoke silently dropped" per docs/gateway-node-integration.md.
7. ReconnectTests — pair -> pairing.reset -> re-pair, asserts Ready in
   both passes and that reset removed at least one record.

Validation (no-hooks build, normal dev):
- Shared 1808 passed
- Tray 1129 passed
- Harness Smoke 2 passed, Gateway 7 skipped (correctly gated)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dotnet restore at the workflow root doesn't generate the win-x64
RID-targeted assets for the WinUI sub-projects (FunctionalUI,
OnboardingV2). The existing ci.yml works around this by omitting
--no-restore on the 'Build Tray App (WinUI)' step, which triggers
the RID-targeted restore. Mirror that here.

Caught by the first PR-triggered run of gateway-compat.yml on the
fork (run 26141658423).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
npm reports no such version 2026.5.17, so PR CI failed to install it
in the Gateway tier. The W0 spike (run 26138294682) installed and
verified 2026.5.18 (which is npm dist-tag 'latest'). Use that as the
real LKG. GatewayLkgTests stays green because both gateway-lkg.json
and GatewayLkg.cs are bumped together.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
First real PR-triggered run on the fork (run 26142143433) revealed the
hook was passing args to RunInDistroAsync which prepends '-d name --'.
Combined with my '-u user --' that produces a double-'--' that ends
wsl arg parsing prematurely - bash sees '-' as positional arg 0 and
fails with 'bash: - : invalid option'.

Switch to RunAsync directly with the production-pattern args:
  wsl -d <distro> -u <user> -- bash -lc <script>
This matches LocalGatewaySetup.cs:993 exactly (which is the
production install command users run via the local-setup flow).

Unit tests updated to snapshot the new arg layout. FakeWslRunner now
implements RunAsync (was previously only RunInDistroAsync). Distro
name extracted from '-d' arg position for test assertion convenience.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR-triggered run 26142580405 surfaced the exact schema requirement:

  models.providers.fake.models.0.name: Invalid input

The W0 spike (which used 'openclaw config schema') only confirmed the
provider root path; it didn't probe the inner array element shape.
Real validate caught it.

Updated GatewayCompatScenarios.FakeLlmProviderPatch and the docs in
tools/fake-llm-server/README.md to use 'name' instead of 'id'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous PR-triggered runs flip-flopped between 'models.0.id: Invalid'
and 'models.0.name: Invalid' depending on which field was missing
last. The real shape requires BOTH id and name plus reasoning, input,
cost, contextWindow, maxTokens - taken verbatim from openclaw's own
src/config/model-alias-defaults.test.ts fixture.

Also fix authMode -> auth (schema.help.ts:938 confirms 'auth' is the
canonical name).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Real schema confirmed at src/config/zod-schema.core.ts:319 of the
gateway repo. Required: id (min 1) + name (min 1). All other fields
optional. My JSON5 shape was correct but flip-flopping errors suggest
the parser is picky. Switch to strict JSON with quoted keys to
remove parser ambiguity as a variable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR run 26143696116 advanced past the schema issue but hit:
'ConfigMutationConflictError: config changed since last load'

openclaw config patch is read-modify-write and can race with the
gateway's own config writes. Retry up to 5 times with 500ms*attempt
backoff, but only for that specific error - other failures fail
fast.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The workflow no longer pre-installs WSL + openclaw under Ubuntu-24.04.
The gateway-compat scenarios now drive the production install path
themselves via tray.testhook.localSetup.start (the same code path the
LocalSetupProgressPage 'Set up locally' button invokes). That is the
exact regression target we want to test against new gateway versions.

- Drop: Install Ubuntu-24.04 distro
- Drop: Provision openclaw user
- Drop: Install openclaw@<version>
- Drop: Start fake LLM server inside WSL
- Add:  WSL host diagnostics (wsl --version/status/list)
- Keep: Register WSL path helper (useful for log paths)
- Change: Collect WSL gateway log now targets OpenClawGateway distro
          (production default created by LocalGatewaySetup engine)
- Change: Cleanup WSL distro now unregisters OpenClawGateway

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces a collection-scoped xUnit fixture that drives the full
production tray.testhook.localSetup.start flow once, then shares the
resulting installed-and-paired tray with every gateway-tier scenario
in the [Collection(`"Gateway`")] collection. Cost (~3-4 min cold) is
paid once per CI run instead of per scenario.

Adds GatewayCompatScenarios helpers:
- DriveLocalSetupAndPrepareGatewayAsync: kicks off localSetup, polls
  localSetup.status to terminal, shells wsl.exe into OpenClawGateway
  to launch tools/spike/start-fake-llm.sh, then applies the verified
  fake-LLM provider patch.
- StartFakeLlmInDistroAsync: wsl.exe-based bootstrap, UTF-8 capture.
- WaitForConnectionAsync: client-side polling around <=20s server
  waits to respect McpClient's 30s HTTP timeout.
- FindRepoRoot + ToWslPath: path helpers.

DistroName flipped from Ubuntu-24.04 to the production default
OpenClawGateway (LocalGatewaySetupOptions.DistroName).

A separate ReconnectFixture lets ReconnectTests own its own pairing
state since it resets and re-pairs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Now that GatewayCollectionFixture drives the full production install
and pairing flow once per CI run, the per-scenario setup boilerplate
(ApplyFakeLlmProviderAsync + localSetup.start + connection.waitFor
with 600s server timeouts) goes away. Each test body becomes just
the specific assertion it was meant to express.

- OperatorPairingTests, NodePairingTests, ToolEventsTests,
  ChatRoundTripTests, NodeInvokeTests, GatewayConfigPatchTests:
  joined [Collection(`"Gateway`")], use GatewayCollectionFixture,
  and confirm settled connection state via WaitForConnectionAsync
  (client-side polling, respects McpClient 30s timeout).
- GatewayConfigPatchTests now uses GatewayCompatScenarios.DistroName
  + FakeLlmProviderPatch (the verified strict-JSON patch shape),
  exercising idempotence against the already-installed gateway.
- ReconnectTests stays per-class on ReconnectFixture so the reset /
  re-pair dance doesn't trash the shared collection state.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iteration in response to the first push to PR #3 where 7 of 9 gateway
scenarios failed:
- Collection fixture's first localSetup attempt failed at `"Creating the
  OpenClaw Gateway WSL instance`" within 18s on a cold runner. All five
  shared-collection scenarios then failed instantly because the fixture
  init faulted once and xUnit reuses the fault.
- ReconnectFixture's attempt got past WSL install but hung 20 min at
  `"Pairing Windows tray node`" before our timeout fired.

Changes:
- DriveLocalSetupAndPrepareGatewayAsync now retries once on
  status=FailedRetryable. Matches the production `"Retry`" button UX
  the user would click on a transient WSL hiccup. Terminal failures
  (FailedTerminal) still fail-fast.
- localSetup wall timeout bumped from 20 min to 25 min (gives the
  pairing step more headroom; will revisit if it still times out).
- GatewayCompatFixture preserves the tray's DataDir (including
  openclaw-tray.log) into ` before deleting it,
  when the workflow sets that env. Workflow sets it to
  TestResults/Gateway-<version>/tray-data, which is uploaded as part
  of the existing gateway-tier results artifact.
- `"Collect WSL gateway log`" now also dumps openclaw service logs
  under ~/.openclaw, distro process list, and listening sockets — so
  the next failure tells us whether the gateway was even listening
  when pairing hung.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants