test(e2e): rely on Kimi trajectory acceptance by cv · Pull Request #4153 · NVIDIA/NemoClaw

cv · 2026-05-24T07:33:55Z

Summary

The nightly flake sweep had one kimi-inference-compat-e2e failure where the OpenClaw command exited 0 and the trajectory later proved all split Kimi exec calls completed cleanly, but an earlier command-output text parser had already incremented FAIL. This PR makes the command-output check validate command completion and leaves exact final-answer/tool-result correctness to the existing trajectory acceptance check.

Changes

Keep K4 failing when the OpenClaw agent command exits non-zero.
Treat non-canonical visible command output as diagnostic when the command exits 0.
Continue relying on K5 trajectory acceptance for exact final assistant text, tool order, split exec calls, and all tool-result completion assertions.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Improved agent inference compatibility test error handling to immediately fail when commands exit with non-zero status and log parsed output with error context.
- Modified test validation to pass when agent commands succeed regardless of output format differences, delegating result verification to trajectory acceptance checks.

copy-pr-bot · 2026-05-24T07:33:59Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-05-24T07:34:51Z

E2E Advisor Recommendation

Required E2E: kimi-inference-compat-e2e
Optional E2E: None

Dispatch hint: kimi-inference-compat-e2e

Auto-dispatched E2E: kimi-inference-compat-e2e via nightly-e2e.yaml at 0fd38635d460e97bae049988020c3417c9c391d8 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

kimi-inference-compat-e2e (medium): Run the exact E2E job whose script changed to verify the updated K4 acceptance logic still exercises the hermetic Kimi-compatible endpoint, inference.local route, OpenClaw Kimi plugin wiring, agent execution, and trajectory validation as intended.

Optional E2E

None.

New E2E recommendations

None.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: kimi-inference-compat-e2e

github-actions · 2026-05-24T07:34:52Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-05-24T07:35:10Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai · 2026-05-24T07:35:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e8ef41bf-12ea-48da-ac04-bc7ec351669e

📥 Commits

Reviewing files that changed from the base of the PR and between 51efc4f and 0fd3863.

📒 Files selected for processing (1)

test/e2e/test-kimi-inference-compat.sh

📝 Walkthrough

Walkthrough

The PR updates post-run validation logic in the run_agent_prompt function of the e2e test script. When the OpenClaw agent command exits with a non-zero status, the test immediately fails and prints diagnostic output. When the agent succeeds, the test passes regardless of final text format, logging non-canonical text and deferring validation to subsequent checks.

Changes

Agent execution validation in e2e test

Layer / File(s)	Summary
Post-run outcome handling in run_agent_prompt `test/e2e/test-kimi-inference-compat.sh`	The `run_agent_prompt` function's post-run validation logic is updated to fail immediately on non-zero agent exit with diagnostic output, and to pass on successful exit regardless of final text format, logging non-canonical text for debugging and deferring correctness validation to trajectory checks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

NVIDIA/NemoClaw#4120: Both PRs modify test/e2e/test-kimi-inference-compat.sh, specifically run_agent_prompt's post-run validation/completion matching logic.
NVIDIA/NemoClaw#4039: Both PRs update e2e OpenClaw agent execution/validation logic to fail differently on agent non-zero exit and surface agent diagnostic details.

Suggested labels

E2E, Integration: OpenClaw, fix

Suggested reviewers

jyaunches

Poem

🐰 A test runs swift, then checks the code,
When exit fails, we now explode—
But when success greets our agent's way,
We log and trust the test's next say,
Validation deferred, the path's more clear! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly relates to the main change: shifting validation responsibility from command-output parsing to trajectory acceptance checking for the Kimi e2e test.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch draft/kimi-e2e-final-text-accounting

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…h-retry-fallback

…e2e-final-text-accounting

…text-accounting

github-actions · 2026-05-24T15:59:07Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26365908238
Target ref: 0fd38635d460e97bae049988020c3417c9c391d8
Workflow ref: main
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job	Result
kimi-inference-compat-e2e	⚠️ cancelled

github-actions · 2026-05-24T16:04:34Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26365944264
Target ref: 0fd38635d460e97bae049988020c3417c9c391d8
Workflow ref: main
Requested jobs: kimi-inference-compat-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
kimi-inference-compat-e2e	✅ success

cv · 2026-05-24T16:07:08Z

Prepared for review after #4152 merged:

Retargeted PR base to main.
Merged current origin/main; PR diff is now only test/e2e/test-kimi-inference-compat.sh.
Marked the PR ready for review.

Validation:

npx prek run --all-files passed.
npm test passed.
Optional selective E2E kimi-inference-compat-e2e passed: https://github.com/NVIDIA/NemoClaw/actions/runs/26365944264

All current PR checks are passing. GitHub still shows some cancelled check runs in the rollup from the base-retarget/draft-to-ready churn, but the latest checks reported by gh pr checks are green.

cv added 4 commits May 24, 2026 00:04

fix(docker): retry gosu release download

e450d09

fix(docker): harden gosu curl download

68e5126

test(e2e): retry inference switch verification

26fdb76

test(e2e): rely on Kimi trajectory acceptance

bf26f2f

cv self-assigned this May 24, 2026

cv added the v0.0.51 Release target label May 24, 2026

cv mentioned this pull request May 24, 2026

test(e2e): classify quick tunnel flakes as external #4154

Merged

12 tasks

cv added 4 commits May 24, 2026 00:46

Merge remote-tracking branch 'origin/main' into draft/inference-switc…

95f9304

…h-retry-fallback

test(e2e): share inference switch retry helper

c438367

Merge branch 'draft/inference-switch-retry-fallback' into draft/kimi-…

2c36e2a

…e2e-final-text-accounting

Merge remote-tracking branch 'origin/main' into draft/kimi-e2e-final-…

0fd3863

…text-accounting

cv changed the base branch from draft/inference-switch-retry-fallback to main May 24, 2026 15:56

cv marked this pull request as ready for review May 24, 2026 15:57

cv merged commit bbc80df into main May 24, 2026
40 of 50 checks passed

cv deleted the draft/kimi-e2e-final-text-accounting branch May 27, 2026 21:16

Conversation

cv commented May 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

cv commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented May 24, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading