test(e2e): classify Hermes live switch timeouts#4158
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR enhances two live-inference check functions in an end-to-end test script by adding helper functions to parse HTTP responses and classify transient failures, enabling both checks to capture HTTP status codes, distinguish between transient and permanent failures, and conditionally skip or fail. The Hermes API check additionally loads configuration and conditionally includes authentication headers. ChangesE2E Inference Check Refactor
Sequence DiagramsequenceDiagram
participant TestRunner
participant Sandbox
participant HermesAPI
TestRunner->>Sandbox: run curl in sandbox
Sandbox->>HermesAPI: HTTP request from curl
HermesAPI-->>Sandbox: HTTP response body
Sandbox-->>TestRunner: stdout with body and status marker
Sandbox-->>TestRunner: exit code is curl exit
TestRunner->>TestRunner: parse marker, classify transient, update last_fail, decide skip or fail
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 0 worth checking, 0 nice ideas This is an automated advisory review. A human maintainer must make the final merge decision. |
Selective E2E Results — ✅ All requested jobs passedRun: 26370573265
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
test/e2e/test-hermes-inference-switch.sh (5)
307-307: ⚖️ Poor tradeoffPotential temp file leak if curl is killed.
If the remote shell command is interrupted or killed between
mktempandrm -f, the temp file will leak. While this is unlikely in practice and the temp directory is typically cleaned up on reboot, consider using a trap to ensure cleanup.♻️ Optional: Add trap for temp file cleanup
- remote="tmp=\$(mktemp); code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 90 https://inference.local/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\"" + remote="tmp=\$(mktemp); trap 'rm -f \"\$tmp\"' EXIT; code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 90 https://inference.local/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-hermes-inference-switch.sh` at line 307, The remote shell command creates a temp file via mktemp stored in tmp and removes it with rm -f at the end, but that can leak if the process is interrupted; add a shell trap in the remote command to ensure rm -f "$tmp" runs on EXIT (and optionally on INT/TERM) before executing the curl sequence, and ensure the trap is set immediately after tmp is created so any early termination still triggers cleanup of the tmp file used by the remote variable.
47-52: ⚡ Quick winPattern mismatch in transient failure detection.
The pattern
*"curl failed with exit 28"*only matches curl exit code 28, but the actual failure message format at lines 318 and 368 is"curl failed with exit ${rc}"whererccan be any curl exit code. Currently, only exit code 28 (timeout) should be classified as transient per the PR objectives and context snippet showing curl exit 28 as retryable.However, if the intent is to only classify exit code 28 as transient, the current implementation is correct. But it's brittle because it relies on exact string matching rather than parsing the exit code.
♻️ Consider parsing the exit code for more robust detection
is_transient_live_inference_failure() { - case "${1:-}" in - *"curl failed with exit 28"* | *"transient HTTP 502"* | *"transient HTTP 503"* | *"transient HTTP 504"*) return 0 ;; - *) return 1 ;; - esac + local msg="${1:-}" + # Check for curl timeout (exit 28) + if [[ "$msg" =~ curl\ failed\ with\ exit\ 28 ]]; then + return 0 + fi + # Check for transient HTTP errors + case "$msg" in + *"transient HTTP 502"* | *"transient HTTP 503"* | *"transient HTTP 504"*) return 0 ;; + *) return 1 ;; + esac }Alternatively, if you want to be more explicit about what's transient:
is_transient_live_inference_failure() { local msg="${1:-}" # Curl exit 28 = timeout [[ "$msg" =~ "curl failed with exit 28" ]] && return 0 # Gateway errors (502/503/504) [[ "$msg" =~ "transient HTTP 50"[234] ]] && return 0 return 1 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-hermes-inference-switch.sh` around lines 47 - 52, The transient-detection in is_transient_live_inference_failure() is brittle because it string-matches "curl failed with exit 28" instead of parsing the exit code; change the function to extract the curl exit code from the input message (e.g., capture the number from "curl failed with exit ${rc}") and return 0 only when that extracted rc == 28, while preserving the existing transient HTTP 502/503/504 checks; reference is_transient_live_inference_failure and the incoming message variable when implementing the parse-and-compare logic.
379-379: 💤 Low valueConsider consistent failure messages between functions.
Line 379 includes both parsed content and raw response body in the failure message, while line 329 in
check_inference_localonly includes content. This asymmetry might confuse during debugging.♻️ Optional: Make failure messages consistent
Either add response body to
check_inference_local:- last_fail="expected PONG, got ${content:0:300}" + last_fail="expected PONG, got ${content:0:300}; response=${body:0:300}"Or remove it from
check_hermes_api_chat:- last_fail="expected PONG from Hermes API, got ${content:0:300}; response=${body:0:300}" + last_fail="expected PONG from Hermes API, got ${content:0:300}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-hermes-inference-switch.sh` at line 379, The failure messages are inconsistent between check_inference_local and check_hermes_api_chat: update the last_fail construction so both functions include the same fields. Either modify check_inference_local to append the raw response body (e.g., include response=${body:0:300} alongside content in its last_fail) or change check_hermes_api_chat to remove the raw body and only include content; ensure you edit the last_fail assignment that references content and body so both check_inference_local and check_hermes_api_chat produce the same formatted message.
357-357: ⚖️ Poor tradeoffSame temp file leak concern as
check_inference_local.The remote command has the same potential temp file leak issue if interrupted. See the earlier comment on line 307 for the optional trap-based mitigation.
♻️ Optional: Add trap for temp file cleanup
- remote="set -a; [ ! -f /sandbox/.hermes/.env ] || . /sandbox/.hermes/.env; set +a; tmp=\$(mktemp); if [ -n \"\${API_SERVER_KEY:-}\" ]; then code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -H \"Authorization: Bearer \${API_SERVER_KEY}\" -d $payload_arg); else code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); fi; rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\"" + remote="set -a; [ ! -f /sandbox/.hermes/.env ] || . /sandbox/.hermes/.env; set +a; tmp=\$(mktemp); trap 'rm -f \"\$tmp\"' EXIT; if [ -n \"\${API_SERVER_KEY:-}\" ]; then code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -H \"Authorization: Bearer \${API_SERVER_KEY}\" -d $payload_arg); else code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); fi; rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-hermes-inference-switch.sh` at line 357, The remote command constructs a temporary file via tmp=$(mktemp) but lacks a robust cleanup if the command is interrupted; add a trap-based cleanup immediately after tmp creation to rm -f the temp file on EXIT (and/or INT/TERM) so the temp file is always removed, mirroring the earlier fix you applied to check_inference_local; place the trap right after the mktemp invocation in the remote string and keep the existing explicit rm -f "$tmp" at the end to be safe.
317-330: Align transient failure classification with the retry loop’slast_failformatting (and consider other curl exit codes)
- In
test/e2e/test-hermes-inference-switch.sh, the curl error path setslast_fail="curl failed with exit ${rc}; HTTP ${http_code}: ..."andis_transient_live_inference_failure()treats a failure as transient only whenlast_failcontainscurl failed with exit 28(ortransient HTTP 502|503|504), so the message-format matching is consistent.- Other network-y curl exit codes (e.g., 6, 7, 35, 52, 55, 56) are not currently classified as transient; consider whether the E2E should treat any of them as transient as well.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-hermes-inference-switch.sh` around lines 317 - 330, The test sets last_fail messages like "curl failed with exit ${rc}; HTTP ${http_code}: ..." but is_transient_live_inference_failure() only detects transient curl failures by matching a specific string ("curl failed with exit 28") and transient HTTP codes; update the logic so the two match: either (a) broaden is_transient_live_inference_failure() to treat other curl exit codes (e.g., 6,7,35,52,55,56 and 28) as transient by checking for "curl failed with exit <code>" patterns or (b) change the last_fail formatting emitted in the curl error branch to include a canonical token that is already detected as transient; modify the function is_transient_live_inference_failure() (and its callers) to use the chosen set of numeric curl exit codes and the existing "transient HTTP 502|503|504" check so detection and message formatting are consistent (refer to last_fail and is_transient_live_inference_failure()).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test-hermes-inference-switch.sh`:
- Around line 367-380: The transient-detection pattern in
is_transient_live_inference_failure() doesn't match the Hermes test's failure
string because the test sets last_fail to "Hermes API curl failed with exit
${rc}..." — update the detection logic in is_transient_live_inference_failure()
to accept both formats (e.g., match "*curl failed with exit 28*" and "*Hermes
API curl failed with exit 28*" or use a regex that allows an optional prefix),
or alternately change the assignment of last_fail in
test-hermes-inference-switch.sh (the last_fail variable in the Hermes check) to
remove the "Hermes API " prefix so the existing pattern matches; reference
is_transient_live_inference_failure() and the last_fail assignment in
test-hermes-inference-switch.sh.
---
Nitpick comments:
In `@test/e2e/test-hermes-inference-switch.sh`:
- Line 307: The remote shell command creates a temp file via mktemp stored in
tmp and removes it with rm -f at the end, but that can leak if the process is
interrupted; add a shell trap in the remote command to ensure rm -f "$tmp" runs
on EXIT (and optionally on INT/TERM) before executing the curl sequence, and
ensure the trap is set immediately after tmp is created so any early termination
still triggers cleanup of the tmp file used by the remote variable.
- Around line 47-52: The transient-detection in
is_transient_live_inference_failure() is brittle because it string-matches "curl
failed with exit 28" instead of parsing the exit code; change the function to
extract the curl exit code from the input message (e.g., capture the number from
"curl failed with exit ${rc}") and return 0 only when that extracted rc == 28,
while preserving the existing transient HTTP 502/503/504 checks; reference
is_transient_live_inference_failure and the incoming message variable when
implementing the parse-and-compare logic.
- Line 379: The failure messages are inconsistent between check_inference_local
and check_hermes_api_chat: update the last_fail construction so both functions
include the same fields. Either modify check_inference_local to append the raw
response body (e.g., include response=${body:0:300} alongside content in its
last_fail) or change check_hermes_api_chat to remove the raw body and only
include content; ensure you edit the last_fail assignment that references
content and body so both check_inference_local and check_hermes_api_chat produce
the same formatted message.
- Line 357: The remote command constructs a temporary file via tmp=$(mktemp) but
lacks a robust cleanup if the command is interrupted; add a trap-based cleanup
immediately after tmp creation to rm -f the temp file on EXIT (and/or INT/TERM)
so the temp file is always removed, mirroring the earlier fix you applied to
check_inference_local; place the trap right after the mktemp invocation in the
remote string and keep the existing explicit rm -f "$tmp" at the end to be safe.
- Around line 317-330: The test sets last_fail messages like "curl failed with
exit ${rc}; HTTP ${http_code}: ..." but is_transient_live_inference_failure()
only detects transient curl failures by matching a specific string ("curl failed
with exit 28") and transient HTTP codes; update the logic so the two match:
either (a) broaden is_transient_live_inference_failure() to treat other curl
exit codes (e.g., 6,7,35,52,55,56 and 28) as transient by checking for "curl
failed with exit <code>" patterns or (b) change the last_fail formatting emitted
in the curl error branch to include a canonical token that is already detected
as transient; modify the function is_transient_live_inference_failure() (and its
callers) to use the chosen set of numeric curl exit codes and the existing
"transient HTTP 502|503|504" check so detection and message formatting are
consistent (refer to last_fail and is_transient_live_inference_failure()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4c7afa5b-f719-41fa-9c25-ce0f594a682b
📒 Files selected for processing (1)
test/e2e/test-hermes-inference-switch.sh
Selective E2E Results — ✅ All requested jobs passedRun: 26370769878
|
Selective E2E Results — ✅ All requested jobs passedRun: 26370990003
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/e2e/test-hermes-inference-switch.sh (2)
360-360: ⚖️ Poor tradeoffConsider reducing curl command duplication.
The remote command duplicates the entire curl invocation in if/else branches, differing only in the Authorization header. While readable, this could be refactored to build header arguments conditionally before a single curl call.
♻️ Example refactor (conditional header construction)
remote="set -a; [ ! -f /sandbox/.hermes/.env ] || . /sandbox/.hermes/.env; set +a; tmp=\$(mktemp); auth_hdr=''; [ -z \"\${API_SERVER_KEY:-}\" ] || auth_hdr=\"Authorization: Bearer \${API_SERVER_KEY}\"; code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 \${auth_hdr:+-H} \${auth_hdr:+\"\$auth_hdr\"} -H 'Content-Type: application/json' http://localhost:8642/v1/chat/completions -d $payload_arg); rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""Note: This increases complexity and may reduce readability. The current duplication is acceptable given the remote command constraints.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-hermes-inference-switch.sh` at line 360, The remote command duplicates the curl invocation in two branches only to vary the Authorization header; refactor by constructing the header string conditionally (based on API_SERVER_KEY) and then call curl once using that header variable (preserve tmp, code, rc, payload_arg, and the URL/http options). Locate the remote variable assignment (remote="...") and replace the if/else duplication with: build auth_hdr (empty or "Authorization: Bearer ${API_SERVER_KEY}"), then invoke curl once with -H 'Content-Type: application/json' plus the auth header when non-empty, keeping the same -sS -o "$tmp" -w '%{http_code}' --max-time 120 options and subsequent cat/rm/printf/exit logic.
58-60: 💤 Low valueEdge case: Response body containing the status marker.
If an API response body contains a line starting with
__NEMOCLAW_HTTP_STATUS__=, that line will be incorrectly removed along with the actual marker. While unlikely in practice, a more defensive approach would delete only the last occurrence or use a more unique delimiter.♻️ More robust alternative using head/tail
http_body_from_response() { - sed '/^__NEMOCLAW_HTTP_STATUS__=/d' <<<"$1" + # Remove only the last line if it's the status marker + local response="$1" + local last_line + last_line=$(tail -1 <<<"$response") + if [[ "$last_line" =~ ^__NEMOCLAW_HTTP_STATUS__= ]]; then + head -n -1 <<<"$response" + else + printf '%s\n' "$response" + fi }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-hermes-inference-switch.sh` around lines 58 - 60, The helper http_body_from_response currently removes every line that starts with the marker __NEMOCLAW_HTTP_STATUS__= which will corrupt responses that legitimately contain that line earlier; change the logic so only the final occurrence is removed. Update http_body_from_response to remove exactly the last line that matches the marker (for example by reversing the input, deleting the first match, and reversing back, or by using an awk one-pass that tracks and skips the last matching line) so the status marker is stripped but any earlier identical lines in the response body are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e/test-hermes-inference-switch.sh`:
- Line 360: The remote command duplicates the curl invocation in two branches
only to vary the Authorization header; refactor by constructing the header
string conditionally (based on API_SERVER_KEY) and then call curl once using
that header variable (preserve tmp, code, rc, payload_arg, and the URL/http
options). Locate the remote variable assignment (remote="...") and replace the
if/else duplication with: build auth_hdr (empty or "Authorization: Bearer
${API_SERVER_KEY}"), then invoke curl once with -H 'Content-Type:
application/json' plus the auth header when non-empty, keeping the same -sS -o
"$tmp" -w '%{http_code}' --max-time 120 options and subsequent
cat/rm/printf/exit logic.
- Around line 58-60: The helper http_body_from_response currently removes every
line that starts with the marker __NEMOCLAW_HTTP_STATUS__= which will corrupt
responses that legitimately contain that line earlier; change the logic so only
the final occurrence is removed. Update http_body_from_response to remove
exactly the last line that matches the marker (for example by reversing the
input, deleting the first match, and reversing back, or by using an awk one-pass
that tracks and skips the last matching line) so the status marker is stripped
but any earlier identical lines in the response body are preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5781d945-fd57-4ca8-8d9b-6c683dff551e
📒 Files selected for processing (1)
test/e2e/test-hermes-inference-switch.sh
|
Prepared for review after #4157 merged:
Validation:
Latest |
Summary
The post-#4154 nightly sweep still shows
hermes-inference-switch-e2efailing after the route/config/hash checks pass, when live Hermesinference.localor API chat requests time out against the upstream model. This PR keeps route/config regressions blocking but classifies explicit post-switch live timeout/5xx probes as external/transient skips.Changes
inference.localand API chat probes.inference.localor Hermes API transient exhaustion to SKIP after earlier route/config checks have passed.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit