Skip to content

test(e2e): classify OpenClaw live switch timeouts#4173

Merged
cv merged 1 commit into
mainfrom
fix/openclaw-inference-switch-live-timeouts
May 25, 2026
Merged

test(e2e): classify OpenClaw live switch timeouts#4173
cv merged 1 commit into
mainfrom
fix/openclaw-inference-switch-live-timeouts

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 25, 2026

Summary

The latest nightly flake sweep shows openclaw-inference-switch-e2e repeatedly passing route/config/hash assertions, then failing during post-switch live requests when inference.local or the OpenClaw agent turn times out. This PR mirrors the Hermes stabilization by keeping route/config regressions blocking while classifying explicit post-switch live timeout/5xx probes as transient skips.

Changes

  • Capture HTTP status for the post-switch OpenClaw inference.local probe.
  • Track transient state structurally from curl exit 28 or HTTP 502/503/504.
  • Convert post-switch inference.local transient exhaustion to SKIP after route/config/session checks have passed.
  • Convert OpenClaw agent command timeout (exit 124) to SKIP after route/config/session checks have passed.
  • Preserve FAIL for wrong-content responses, unexpected HTTP statuses, and all route/config/hash/session regressions.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests
    • Improved test resilience by implementing transient failure detection for HTTP status codes (502/503/504), distinguishing temporary from permanent errors
    • Enhanced timeout handling to properly classify timeout scenarios in endpoint tests
    • Added utilities for more robust HTTP response parsing and status classification

Review Change Stack

@cv cv self-assigned this May 25, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

This PR enhances OpenClaw inference test reliability by distinguishing transient failures (timeouts, HTTP 502/503/504) from permanent failures. New HTTP response parsing helpers classify transient conditions, sandbox inference checks now skip on transient failures instead of failing, and agent turn checks skip on command timeout.

Changes

Transient HTTP Error Classification and Timeout Handling

Layer / File(s) Summary
HTTP response parsing and transient classification helpers
test/e2e/test-openclaw-inference-switch.sh
Three utility functions classify transient HTTP codes (502/503/504) and extract HTTP status and body from combined curl response strings.
Sandbox inference transient failure tracking
test/e2e/test-openclaw-inference-switch.sh
Extended check_sandbox_inference to run a remote curl wrapper that separates response body from HTTP status, track transient state across retries using the new helpers, and change outcomes to skip for transient failures or fail for non-transient failures.
Agent turn command timeout handling
test/e2e/test-openclaw-inference-switch.sh
Modified check_openclaw_agent_turn to record skip outcomes for SSH command timeouts (exit code 124) while preserving pass/fail behavior for other results.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4158: Introduces the same curl response parsing helpers and transient failure classification logic in a separate inference script, using identical HTTP status and timeout detection patterns to change outcomes from fail to skip.
  • NVIDIA/NemoClaw#4154: Adds transient HTTP status classification (502/503/504) and refactors error handling to skip on transient conditions instead of fail in e2e retry logic.

Suggested labels

v0.0.51

Poem

🐰 Through timeouts and status codes we hop,
Transient troubles now gracefully skip,
No more false failures when servers take rest,
Just a gentle "skip" for the load test.
Smart inference waits, the fleet's on its way! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change—improving OpenClaw test resilience by classifying live switch timeouts as transient rather than failures.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/openclaw-inference-switch-live-timeouts

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: openclaw-inference-switch-e2e

Dispatch hint: openclaw-inference-switch-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None.

Optional E2E

  • openclaw-inference-switch-e2e (medium): Optional self-validation of the modified E2E script. This job is the direct workflow consumer of test/e2e/test-openclaw-inference-switch.sh and would catch shell, HTTP parsing, retry/skip, and live OpenClaw inference-switch assertion regressions introduced by the test change.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: openclaw-inference-switch-e2e

@github-actions
Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 1 nice ideas
Top item: Mixed live-probe failures can be reported as SKIP

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Mixed live-probe failures can be reported as SKIP (test/e2e/test-openclaw-inference-switch.sh:218): `check_sandbox_inference` resets `transient=0` on every retry and only inspects the final attempt after the loop. If earlier attempts fail for a non-transient reason such as malformed JSON, wrong content, or an unexpected HTTP status, and the last attempt is a timeout or 502/503/504, the test reports SKIP. That can hide intermittent correctness regressions rather than only classifying an exhaustion of explicit transient failures as skipped.
    • Recommendation: Track whether all exhausted attempts were transient, or fail immediately/at summary if any non-transient response was observed. For example, maintain `saw_non_transient=1` for wrong-content, parse, and unexpected-status failures and only SKIP when no non-transient attempts occurred.
    • Evidence: The loop initializes `transient=0` per attempt, sets it for `rc == 28` or transient HTTP status, and after all attempts checks only `[ "$transient" -eq 1 ]` before calling `skip`; `last_fail` is also overwritten by the last attempt.

🌱 Nice ideas

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/e2e/test-openclaw-inference-switch.sh (2)

47-60: ⚡ Quick win

Consider extracting shared HTTP response helpers to a library.

These three functions are identical to test-hermes-inference-switch.sh (lines 47-61). Extracting them to test/e2e/lib/http-response-helpers.sh would reduce duplication and ensure consistent transient classification across all inference-switch E2Es.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-openclaw-inference-switch.sh` around lines 47 - 60, Extract the
three helper functions is_transient_live_http_code, http_status_from_response,
and http_body_from_response into a new shared file
test/e2e/lib/http-response-helpers.sh; replace the duplicated definitions in
test/e2e/test-openclaw-inference-switch.sh and
test/e2e/test-hermes-inference-switch.sh by sourcing that new file (e.g., .
"$(dirname "$0")/lib/http-response-helpers.sh" or similar), and ensure the
functions' behavior and names remain unchanged so transient classification is
consistent across both E2E scripts.

248-281: 💤 Low value

Last-attempt-wins transient detection is by design.

The transient flag resets on each attempt (line 248), so only the final attempt determines whether the outcome is SKIP or FAIL. This means if the first two attempts fail with non-transient errors but the third times out, the test will skip. This behavior aligns with the PR objective of treating post-switch live timeouts as non-blocking, but it could theoretically mask degradation patterns where permanent failures evolve into timeouts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-openclaw-inference-switch.sh` around lines 248 - 281, The
transient flag is overwritten each loop so only the last attempt decides SKIP vs
FAIL; preserve non-transient failures across attempts by adding a persistent
marker (e.g., non_transient_seen) or make transient sticky. Update the loop
around where transient, last_fail and attempt are set (variables transient,
last_fail, attempt and function is_transient_live_http_code) so that: when a
non-transient error is observed (HTTP != 200 and not
is_transient_live_http_code, or curl rc != 28), set non_transient_seen=1 (or do
not clear transient once set); when a transient condition is observed set
transient=1 but do not overwrite a previously-recorded non_transient_seen; after
the loop decide SKIP only if transient==1 and non_transient_seen is unset,
otherwise FAIL using the earliest/most relevant last_fail recorded. This ensures
any non-transient failure across attempts forces FAIL while still allowing final
transient timeouts to be SKIP when no non-transient failure occurred.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/test-openclaw-inference-switch.sh`:
- Around line 47-60: Extract the three helper functions
is_transient_live_http_code, http_status_from_response, and
http_body_from_response into a new shared file
test/e2e/lib/http-response-helpers.sh; replace the duplicated definitions in
test/e2e/test-openclaw-inference-switch.sh and
test/e2e/test-hermes-inference-switch.sh by sourcing that new file (e.g., .
"$(dirname "$0")/lib/http-response-helpers.sh" or similar), and ensure the
functions' behavior and names remain unchanged so transient classification is
consistent across both E2E scripts.
- Around line 248-281: The transient flag is overwritten each loop so only the
last attempt decides SKIP vs FAIL; preserve non-transient failures across
attempts by adding a persistent marker (e.g., non_transient_seen) or make
transient sticky. Update the loop around where transient, last_fail and attempt
are set (variables transient, last_fail, attempt and function
is_transient_live_http_code) so that: when a non-transient error is observed
(HTTP != 200 and not is_transient_live_http_code, or curl rc != 28), set
non_transient_seen=1 (or do not clear transient once set); when a transient
condition is observed set transient=1 but do not overwrite a previously-recorded
non_transient_seen; after the loop decide SKIP only if transient==1 and
non_transient_seen is unset, otherwise FAIL using the earliest/most relevant
last_fail recorded. This ensures any non-transient failure across attempts
forces FAIL while still allowing final transient timeouts to be SKIP when no
non-transient failure occurred.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 592c722f-364b-43dd-8f25-b4a4466e690a

📥 Commits

Reviewing files that changed from the base of the PR and between 50c208b and 82560f9.

📒 Files selected for processing (1)
  • test/e2e/test-openclaw-inference-switch.sh

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26388552457
Target ref: 82560f9a1a74ebdd1520c4bcf06afdebf971bd97
Workflow ref: main
Requested jobs: openclaw-inference-switch-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
openclaw-inference-switch-e2e ✅ success

@cv cv added the v0.0.51 Release target label May 25, 2026
@cv cv merged commit cab6f8c into main May 25, 2026
22 checks passed
@cv cv deleted the fix/openclaw-inference-switch-live-timeouts branch May 27, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.51 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants