test(e2e): classify OpenClaw live switch timeouts#4173
Conversation
📝 WalkthroughWalkthroughThis PR enhances OpenClaw inference test reliability by distinguishing transient failures (timeouts, HTTP 502/503/504) from permanent failures. New HTTP response parsing helpers classify transient conditions, sandbox inference checks now skip on transient failures instead of failing, and agent turn checks skip on command timeout. ChangesTransient HTTP Error Classification and Timeout Handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 1 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/e2e/test-openclaw-inference-switch.sh (2)
47-60: ⚡ Quick winConsider extracting shared HTTP response helpers to a library.
These three functions are identical to
test-hermes-inference-switch.sh(lines 47-61). Extracting them totest/e2e/lib/http-response-helpers.shwould reduce duplication and ensure consistent transient classification across all inference-switch E2Es.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-openclaw-inference-switch.sh` around lines 47 - 60, Extract the three helper functions is_transient_live_http_code, http_status_from_response, and http_body_from_response into a new shared file test/e2e/lib/http-response-helpers.sh; replace the duplicated definitions in test/e2e/test-openclaw-inference-switch.sh and test/e2e/test-hermes-inference-switch.sh by sourcing that new file (e.g., . "$(dirname "$0")/lib/http-response-helpers.sh" or similar), and ensure the functions' behavior and names remain unchanged so transient classification is consistent across both E2E scripts.
248-281: 💤 Low valueLast-attempt-wins transient detection is by design.
The
transientflag resets on each attempt (line 248), so only the final attempt determines whether the outcome is SKIP or FAIL. This means if the first two attempts fail with non-transient errors but the third times out, the test will skip. This behavior aligns with the PR objective of treating post-switch live timeouts as non-blocking, but it could theoretically mask degradation patterns where permanent failures evolve into timeouts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-openclaw-inference-switch.sh` around lines 248 - 281, The transient flag is overwritten each loop so only the last attempt decides SKIP vs FAIL; preserve non-transient failures across attempts by adding a persistent marker (e.g., non_transient_seen) or make transient sticky. Update the loop around where transient, last_fail and attempt are set (variables transient, last_fail, attempt and function is_transient_live_http_code) so that: when a non-transient error is observed (HTTP != 200 and not is_transient_live_http_code, or curl rc != 28), set non_transient_seen=1 (or do not clear transient once set); when a transient condition is observed set transient=1 but do not overwrite a previously-recorded non_transient_seen; after the loop decide SKIP only if transient==1 and non_transient_seen is unset, otherwise FAIL using the earliest/most relevant last_fail recorded. This ensures any non-transient failure across attempts forces FAIL while still allowing final transient timeouts to be SKIP when no non-transient failure occurred.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e/test-openclaw-inference-switch.sh`:
- Around line 47-60: Extract the three helper functions
is_transient_live_http_code, http_status_from_response, and
http_body_from_response into a new shared file
test/e2e/lib/http-response-helpers.sh; replace the duplicated definitions in
test/e2e/test-openclaw-inference-switch.sh and
test/e2e/test-hermes-inference-switch.sh by sourcing that new file (e.g., .
"$(dirname "$0")/lib/http-response-helpers.sh" or similar), and ensure the
functions' behavior and names remain unchanged so transient classification is
consistent across both E2E scripts.
- Around line 248-281: The transient flag is overwritten each loop so only the
last attempt decides SKIP vs FAIL; preserve non-transient failures across
attempts by adding a persistent marker (e.g., non_transient_seen) or make
transient sticky. Update the loop around where transient, last_fail and attempt
are set (variables transient, last_fail, attempt and function
is_transient_live_http_code) so that: when a non-transient error is observed
(HTTP != 200 and not is_transient_live_http_code, or curl rc != 28), set
non_transient_seen=1 (or do not clear transient once set); when a transient
condition is observed set transient=1 but do not overwrite a previously-recorded
non_transient_seen; after the loop decide SKIP only if transient==1 and
non_transient_seen is unset, otherwise FAIL using the earliest/most relevant
last_fail recorded. This ensures any non-transient failure across attempts
forces FAIL while still allowing final transient timeouts to be SKIP when no
non-transient failure occurred.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 592c722f-364b-43dd-8f25-b4a4466e690a
📒 Files selected for processing (1)
test/e2e/test-openclaw-inference-switch.sh
Selective E2E Results — ✅ All requested jobs passedRun: 26388552457
|
Summary
The latest nightly flake sweep shows
openclaw-inference-switch-e2erepeatedly passing route/config/hash assertions, then failing during post-switch live requests wheninference.localor the OpenClaw agent turn times out. This PR mirrors the Hermes stabilization by keeping route/config regressions blocking while classifying explicit post-switch live timeout/5xx probes as transient skips.Changes
inference.localprobe.inference.localtransient exhaustion to SKIP after route/config/session checks have passed.exit 124) to SKIP after route/config/session checks have passed.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit