Skip to content

test(e2e): rely on Kimi trajectory acceptance#4153

Merged
cv merged 8 commits into
mainfrom
draft/kimi-e2e-final-text-accounting
May 24, 2026
Merged

test(e2e): rely on Kimi trajectory acceptance#4153
cv merged 8 commits into
mainfrom
draft/kimi-e2e-final-text-accounting

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 24, 2026

Summary

The nightly flake sweep had one kimi-inference-compat-e2e failure where the OpenClaw command exited 0 and the trajectory later proved all split Kimi exec calls completed cleanly, but an earlier command-output text parser had already incremented FAIL. This PR makes the command-output check validate command completion and leaves exact final-answer/tool-result correctness to the existing trajectory acceptance check.

Changes

  • Keep K4 failing when the OpenClaw agent command exits non-zero.
  • Treat non-canonical visible command output as diagnostic when the command exits 0.
  • Continue relying on K5 trajectory acceptance for exact final assistant text, tool order, split exec calls, and all tool-result completion assertions.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests
    • Improved agent inference compatibility test error handling to immediately fail when commands exit with non-zero status and log parsed output with error context.
    • Modified test validation to pass when agent commands succeed regardless of output format differences, delegating result verification to trajectory acceptance checks.

Review Change Stack

@cv cv self-assigned this May 24, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cv cv added the v0.0.51 Release target label May 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Advisor Recommendation

Required E2E: kimi-inference-compat-e2e
Optional E2E: None

Dispatch hint: kimi-inference-compat-e2e

Auto-dispatched E2E: kimi-inference-compat-e2e via nightly-e2e.yaml at 0fd38635d460e97bae049988020c3417c9c391d8nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • kimi-inference-compat-e2e (medium): Run the exact E2E job whose script changed to verify the updated K4 acceptance logic still exercises the hermetic Kimi-compatible endpoint, inference.local route, OpenClaw Kimi plugin wiring, agent execution, and trajectory validation as intended.

Optional E2E

  • None.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: kimi-inference-compat-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e8ef41bf-12ea-48da-ac04-bc7ec351669e

📥 Commits

Reviewing files that changed from the base of the PR and between 51efc4f and 0fd3863.

📒 Files selected for processing (1)
  • test/e2e/test-kimi-inference-compat.sh

📝 Walkthrough

Walkthrough

The PR updates post-run validation logic in the run_agent_prompt function of the e2e test script. When the OpenClaw agent command exits with a non-zero status, the test immediately fails and prints diagnostic output. When the agent succeeds, the test passes regardless of final text format, logging non-canonical text and deferring validation to subsequent checks.

Changes

Agent execution validation in e2e test

Layer / File(s) Summary
Post-run outcome handling in run_agent_prompt
test/e2e/test-kimi-inference-compat.sh
The run_agent_prompt function's post-run validation logic is updated to fail immediately on non-zero agent exit with diagnostic output, and to pass on successful exit regardless of final text format, logging non-canonical text for debugging and deferring correctness validation to trajectory checks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4120: Both PRs modify test/e2e/test-kimi-inference-compat.sh, specifically run_agent_prompt's post-run validation/completion matching logic.
  • NVIDIA/NemoClaw#4039: Both PRs update e2e OpenClaw agent execution/validation logic to fail differently on agent non-zero exit and surface agent diagnostic details.

Suggested labels

E2E, Integration: OpenClaw, fix

Suggested reviewers

  • jyaunches

Poem

🐰 A test runs swift, then checks the code,
When exit fails, we now explode—
But when success greets our agent's way,
We log and trust the test's next say,
Validation deferred, the path's more clear! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly relates to the main change: shifting validation responsibility from command-output parsing to trajectory acceptance checking for the Kimi e2e test.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch draft/kimi-e2e-final-text-accounting

Comment @coderabbitai help to get the list of available commands and usage tips.

@cv cv changed the base branch from draft/inference-switch-retry-fallback to main May 24, 2026 15:56
@cv cv marked this pull request as ready for review May 24, 2026 15:57
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26365908238
Target ref: 0fd38635d460e97bae049988020c3417c9c391d8
Workflow ref: main
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
kimi-inference-compat-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26365944264
Target ref: 0fd38635d460e97bae049988020c3417c9c391d8
Workflow ref: main
Requested jobs: kimi-inference-compat-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
kimi-inference-compat-e2e ✅ success

@cv
Copy link
Copy Markdown
Collaborator Author

cv commented May 24, 2026

Prepared for review after #4152 merged:

  • Retargeted PR base to main.
  • Merged current origin/main; PR diff is now only test/e2e/test-kimi-inference-compat.sh.
  • Marked the PR ready for review.

Validation:

All current PR checks are passing. GitHub still shows some cancelled check runs in the rollup from the base-retarget/draft-to-ready churn, but the latest checks reported by gh pr checks are green.

@cv cv merged commit bbc80df into main May 24, 2026
40 of 50 checks passed
@cv cv deleted the draft/kimi-e2e-final-text-accounting branch May 27, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.51 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants