Skip to content

test(e2e): classify Hermes live switch timeouts#4158

Merged
cv merged 5 commits into
mainfrom
draft/hermes-inference-switch-live-timeout-skips
May 25, 2026
Merged

test(e2e): classify Hermes live switch timeouts#4158
cv merged 5 commits into
mainfrom
draft/hermes-inference-switch-live-timeout-skips

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 24, 2026

Summary

The post-#4154 nightly sweep still shows hermes-inference-switch-e2e failing after the route/config/hash checks pass, when live Hermes inference.local or API chat requests time out against the upstream model. This PR keeps route/config regressions blocking but classifies explicit post-switch live timeout/5xx probes as external/transient skips.

Changes

  • Capture HTTP status for post-switch Hermes inference.local and API chat probes.
  • Track transient state structurally from curl exit 28 or HTTP 502/503/504, instead of matching arbitrary response text.
  • Convert post-switch inference.local or Hermes API transient exhaustion to SKIP after earlier route/config checks have passed.
  • Preserve FAIL for non-timeout wrong-content responses, unexpected HTTP statuses, and all route/config/hash/registry regressions.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests
    • Improved live-inference test reliability with enhanced transient failure detection and retry behavior.
    • Upgraded HTTP response handling to separately capture status codes and body for clearer diagnostics and enriched failure messages.
    • Added conditional API authentication in test requests and skip-on-transient-failure behavior to reduce flaky failures.

Review Change Stack

@cv cv self-assigned this May 24, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This PR enhances two live-inference check functions in an end-to-end test script by adding helper functions to parse HTTP responses and classify transient failures, enabling both checks to capture HTTP status codes, distinguish between transient and permanent failures, and conditionally skip or fail. The Hermes API check additionally loads configuration and conditionally includes authentication headers.

Changes

E2E Inference Check Refactor

Layer / File(s) Summary
Helper functions for response parsing and failure classification
test/e2e/test-hermes-inference-switch.sh
Three new Bash functions to extract HTTP status from an in-band __NEMOCLAW_HTTP_STATUS__=... marker, extract the response body before that marker, and classify transient failures (curl exit 28 or HTTP 502/503/504).
check_inference_local: HTTP-aware retry logic
test/e2e/test-hermes-inference-switch.sh
Replaces per-attempt raw curl with an in-sandbox command that emits body+status marker, parses http_code/body, builds enriched last_fail messages including curl exit code/status/truncated body, classifies transient errors, and skips on transient failures or fails otherwise after retries.
check_hermes_api_chat: Auth header and HTTP-aware retry logic
test/e2e/test-hermes-inference-switch.sh
Reworks the Hermes API chat check to optionally source /sandbox/.hermes/.env, conditionally include Authorization: Bearer $API_SERVER_KEY, capture HTTP status via the marker line, parse http_code/body, classify transient failures, and skip or fail after retries while still checking for expected chat content.

Sequence Diagram

sequenceDiagram
  participant TestRunner
  participant Sandbox
  participant HermesAPI
  TestRunner->>Sandbox: run curl in sandbox
  Sandbox->>HermesAPI: HTTP request from curl
  HermesAPI-->>Sandbox: HTTP response body
  Sandbox-->>TestRunner: stdout with body and status marker
  Sandbox-->>TestRunner: exit code is curl exit
  TestRunner->>TestRunner: parse marker, classify transient, update last_fail, decide skip or fail
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4154: Adds similar transient HTTP-failure classification helpers and updates e2e retry logic in a different test script.

Suggested labels

v0.0.51

Poem

🐰 I hopped through test-script dusk and dawn,

I split the status from the body anon.
When curl times out or 503s play,
I mark it transient, and skip the day.
Hooray — the checks now parse and respond!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title focuses on classifying timeouts, but the PR's main changes involve capturing HTTP status, detecting transient failures (curl exit 28 and HTTP 502/503/504), and converting post-switch transient exhaustion to SKIP conditions. While timeout handling is part of the solution, the title doesn't capture the broader intent of distinguishing transient live failures from regressions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch draft/hermes-inference-switch-live-timeout-skips

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Advisor Recommendation

Required E2E: None
Optional E2E: hermes-inference-switch-e2e

Dispatch hint: hermes-inference-switch-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None.

Optional E2E

  • hermes-inference-switch-e2e (high): Optional confidence check for the modified E2E harness itself; it is the exact job that runs test/e2e/test-hermes-inference-switch.sh. Not merge-blocking because no runtime/user-flow implementation files changed.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: hermes-inference-switch-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26370573265
Target ref: 40c63e66737c314d9db50c859961e8fbc8d567f5
Workflow ref: main
Requested jobs: hermes-inference-switch-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
hermes-inference-switch-e2e ✅ success

@cv cv changed the base branch from fix/skill-agent-e2e-model-flakes to main May 24, 2026 19:34
@cv cv marked this pull request as ready for review May 24, 2026 19:34
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
test/e2e/test-hermes-inference-switch.sh (5)

307-307: ⚖️ Poor tradeoff

Potential temp file leak if curl is killed.

If the remote shell command is interrupted or killed between mktemp and rm -f, the temp file will leak. While this is unlikely in practice and the temp directory is typically cleaned up on reboot, consider using a trap to ensure cleanup.

♻️ Optional: Add trap for temp file cleanup
-  remote="tmp=\$(mktemp); code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 90 https://inference.local/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""
+  remote="tmp=\$(mktemp); trap 'rm -f \"\$tmp\"' EXIT; code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 90 https://inference.local/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-hermes-inference-switch.sh` at line 307, The remote shell
command creates a temp file via mktemp stored in tmp and removes it with rm -f
at the end, but that can leak if the process is interrupted; add a shell trap in
the remote command to ensure rm -f "$tmp" runs on EXIT (and optionally on
INT/TERM) before executing the curl sequence, and ensure the trap is set
immediately after tmp is created so any early termination still triggers cleanup
of the tmp file used by the remote variable.

47-52: ⚡ Quick win

Pattern mismatch in transient failure detection.

The pattern *"curl failed with exit 28"* only matches curl exit code 28, but the actual failure message format at lines 318 and 368 is "curl failed with exit ${rc}" where rc can be any curl exit code. Currently, only exit code 28 (timeout) should be classified as transient per the PR objectives and context snippet showing curl exit 28 as retryable.

However, if the intent is to only classify exit code 28 as transient, the current implementation is correct. But it's brittle because it relies on exact string matching rather than parsing the exit code.

♻️ Consider parsing the exit code for more robust detection
 is_transient_live_inference_failure() {
-  case "${1:-}" in
-    *"curl failed with exit 28"* | *"transient HTTP 502"* | *"transient HTTP 503"* | *"transient HTTP 504"*) return 0 ;;
-    *) return 1 ;;
-  esac
+  local msg="${1:-}"
+  # Check for curl timeout (exit 28)
+  if [[ "$msg" =~ curl\ failed\ with\ exit\ 28 ]]; then
+    return 0
+  fi
+  # Check for transient HTTP errors
+  case "$msg" in
+    *"transient HTTP 502"* | *"transient HTTP 503"* | *"transient HTTP 504"*) return 0 ;;
+    *) return 1 ;;
+  esac
 }

Alternatively, if you want to be more explicit about what's transient:

is_transient_live_inference_failure() {
  local msg="${1:-}"
  # Curl exit 28 = timeout
  [[ "$msg" =~ "curl failed with exit 28" ]] && return 0
  # Gateway errors (502/503/504)
  [[ "$msg" =~ "transient HTTP 50"[234] ]] && return 0
  return 1
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-hermes-inference-switch.sh` around lines 47 - 52, The
transient-detection in is_transient_live_inference_failure() is brittle because
it string-matches "curl failed with exit 28" instead of parsing the exit code;
change the function to extract the curl exit code from the input message (e.g.,
capture the number from "curl failed with exit ${rc}") and return 0 only when
that extracted rc == 28, while preserving the existing transient HTTP
502/503/504 checks; reference is_transient_live_inference_failure and the
incoming message variable when implementing the parse-and-compare logic.

379-379: 💤 Low value

Consider consistent failure messages between functions.

Line 379 includes both parsed content and raw response body in the failure message, while line 329 in check_inference_local only includes content. This asymmetry might confuse during debugging.

♻️ Optional: Make failure messages consistent

Either add response body to check_inference_local:

-      last_fail="expected PONG, got ${content:0:300}"
+      last_fail="expected PONG, got ${content:0:300}; response=${body:0:300}"

Or remove it from check_hermes_api_chat:

-      last_fail="expected PONG from Hermes API, got ${content:0:300}; response=${body:0:300}"
+      last_fail="expected PONG from Hermes API, got ${content:0:300}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-hermes-inference-switch.sh` at line 379, The failure messages
are inconsistent between check_inference_local and check_hermes_api_chat: update
the last_fail construction so both functions include the same fields. Either
modify check_inference_local to append the raw response body (e.g., include
response=${body:0:300} alongside content in its last_fail) or change
check_hermes_api_chat to remove the raw body and only include content; ensure
you edit the last_fail assignment that references content and body so both
check_inference_local and check_hermes_api_chat produce the same formatted
message.

357-357: ⚖️ Poor tradeoff

Same temp file leak concern as check_inference_local.

The remote command has the same potential temp file leak issue if interrupted. See the earlier comment on line 307 for the optional trap-based mitigation.

♻️ Optional: Add trap for temp file cleanup
-  remote="set -a; [ ! -f /sandbox/.hermes/.env ] || . /sandbox/.hermes/.env; set +a; tmp=\$(mktemp); if [ -n \"\${API_SERVER_KEY:-}\" ]; then code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -H \"Authorization: Bearer \${API_SERVER_KEY}\" -d $payload_arg); else code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); fi; rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""
+  remote="set -a; [ ! -f /sandbox/.hermes/.env ] || . /sandbox/.hermes/.env; set +a; tmp=\$(mktemp); trap 'rm -f \"\$tmp\"' EXIT; if [ -n \"\${API_SERVER_KEY:-}\" ]; then code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -H \"Authorization: Bearer \${API_SERVER_KEY}\" -d $payload_arg); else code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 http://localhost:8642/v1/chat/completions -H 'Content-Type: application/json' -d $payload_arg); fi; rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-hermes-inference-switch.sh` at line 357, The remote command
constructs a temporary file via tmp=$(mktemp) but lacks a robust cleanup if the
command is interrupted; add a trap-based cleanup immediately after tmp creation
to rm -f the temp file on EXIT (and/or INT/TERM) so the temp file is always
removed, mirroring the earlier fix you applied to check_inference_local; place
the trap right after the mktemp invocation in the remote string and keep the
existing explicit rm -f "$tmp" at the end to be safe.

317-330: Align transient failure classification with the retry loop’s last_fail formatting (and consider other curl exit codes)

  • In test/e2e/test-hermes-inference-switch.sh, the curl error path sets last_fail="curl failed with exit ${rc}; HTTP ${http_code}: ..." and is_transient_live_inference_failure() treats a failure as transient only when last_fail contains curl failed with exit 28 (or transient HTTP 502|503|504), so the message-format matching is consistent.
  • Other network-y curl exit codes (e.g., 6, 7, 35, 52, 55, 56) are not currently classified as transient; consider whether the E2E should treat any of them as transient as well.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-hermes-inference-switch.sh` around lines 317 - 330, The test
sets last_fail messages like "curl failed with exit ${rc}; HTTP ${http_code}:
..." but is_transient_live_inference_failure() only detects transient curl
failures by matching a specific string ("curl failed with exit 28") and
transient HTTP codes; update the logic so the two match: either (a) broaden
is_transient_live_inference_failure() to treat other curl exit codes (e.g.,
6,7,35,52,55,56 and 28) as transient by checking for "curl failed with exit
<code>" patterns or (b) change the last_fail formatting emitted in the curl
error branch to include a canonical token that is already detected as transient;
modify the function is_transient_live_inference_failure() (and its callers) to
use the chosen set of numeric curl exit codes and the existing "transient HTTP
502|503|504" check so detection and message formatting are consistent (refer to
last_fail and is_transient_live_inference_failure()).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-hermes-inference-switch.sh`:
- Around line 367-380: The transient-detection pattern in
is_transient_live_inference_failure() doesn't match the Hermes test's failure
string because the test sets last_fail to "Hermes API curl failed with exit
${rc}..." — update the detection logic in is_transient_live_inference_failure()
to accept both formats (e.g., match "*curl failed with exit 28*" and "*Hermes
API curl failed with exit 28*" or use a regex that allows an optional prefix),
or alternately change the assignment of last_fail in
test-hermes-inference-switch.sh (the last_fail variable in the Hermes check) to
remove the "Hermes API " prefix so the existing pattern matches; reference
is_transient_live_inference_failure() and the last_fail assignment in
test-hermes-inference-switch.sh.

---

Nitpick comments:
In `@test/e2e/test-hermes-inference-switch.sh`:
- Line 307: The remote shell command creates a temp file via mktemp stored in
tmp and removes it with rm -f at the end, but that can leak if the process is
interrupted; add a shell trap in the remote command to ensure rm -f "$tmp" runs
on EXIT (and optionally on INT/TERM) before executing the curl sequence, and
ensure the trap is set immediately after tmp is created so any early termination
still triggers cleanup of the tmp file used by the remote variable.
- Around line 47-52: The transient-detection in
is_transient_live_inference_failure() is brittle because it string-matches "curl
failed with exit 28" instead of parsing the exit code; change the function to
extract the curl exit code from the input message (e.g., capture the number from
"curl failed with exit ${rc}") and return 0 only when that extracted rc == 28,
while preserving the existing transient HTTP 502/503/504 checks; reference
is_transient_live_inference_failure and the incoming message variable when
implementing the parse-and-compare logic.
- Line 379: The failure messages are inconsistent between check_inference_local
and check_hermes_api_chat: update the last_fail construction so both functions
include the same fields. Either modify check_inference_local to append the raw
response body (e.g., include response=${body:0:300} alongside content in its
last_fail) or change check_hermes_api_chat to remove the raw body and only
include content; ensure you edit the last_fail assignment that references
content and body so both check_inference_local and check_hermes_api_chat produce
the same formatted message.
- Line 357: The remote command constructs a temporary file via tmp=$(mktemp) but
lacks a robust cleanup if the command is interrupted; add a trap-based cleanup
immediately after tmp creation to rm -f the temp file on EXIT (and/or INT/TERM)
so the temp file is always removed, mirroring the earlier fix you applied to
check_inference_local; place the trap right after the mktemp invocation in the
remote string and keep the existing explicit rm -f "$tmp" at the end to be safe.
- Around line 317-330: The test sets last_fail messages like "curl failed with
exit ${rc}; HTTP ${http_code}: ..." but is_transient_live_inference_failure()
only detects transient curl failures by matching a specific string ("curl failed
with exit 28") and transient HTTP codes; update the logic so the two match:
either (a) broaden is_transient_live_inference_failure() to treat other curl
exit codes (e.g., 6,7,35,52,55,56 and 28) as transient by checking for "curl
failed with exit <code>" patterns or (b) change the last_fail formatting emitted
in the curl error branch to include a canonical token that is already detected
as transient; modify the function is_transient_live_inference_failure() (and its
callers) to use the chosen set of numeric curl exit codes and the existing
"transient HTTP 502|503|504" check so detection and message formatting are
consistent (refer to last_fail and is_transient_live_inference_failure()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4c7afa5b-f719-41fa-9c25-ce0f594a682b

📥 Commits

Reviewing files that changed from the base of the PR and between 85e2133 and 983d9f0.

📒 Files selected for processing (1)
  • test/e2e/test-hermes-inference-switch.sh

Comment thread test/e2e/test-hermes-inference-switch.sh
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26370769878
Target ref: 983d9f0f5f23baa6c2440e101eab34b3a989f133
Workflow ref: main
Requested jobs: hermes-inference-switch-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
hermes-inference-switch-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26370990003
Target ref: f3e80ce2eb02365811a8d2111c838e9eec58ffaa
Workflow ref: main
Requested jobs: hermes-inference-switch-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
hermes-inference-switch-e2e ✅ success

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/e2e/test-hermes-inference-switch.sh (2)

360-360: ⚖️ Poor tradeoff

Consider reducing curl command duplication.

The remote command duplicates the entire curl invocation in if/else branches, differing only in the Authorization header. While readable, this could be refactored to build header arguments conditionally before a single curl call.

♻️ Example refactor (conditional header construction)
remote="set -a; [ ! -f /sandbox/.hermes/.env ] || . /sandbox/.hermes/.env; set +a; tmp=\$(mktemp); auth_hdr=''; [ -z \"\${API_SERVER_KEY:-}\" ] || auth_hdr=\"Authorization: Bearer \${API_SERVER_KEY}\"; code=\$(curl -sS -o \"\$tmp\" -w '%{http_code}' --max-time 120 \${auth_hdr:+-H} \${auth_hdr:+\"\$auth_hdr\"} -H 'Content-Type: application/json' http://localhost:8642/v1/chat/completions -d $payload_arg); rc=\$?; cat \"\$tmp\"; rm -f \"\$tmp\"; printf '\n__NEMOCLAW_HTTP_STATUS__=%s\n' \"\${code:-000}\"; exit \"\$rc\""

Note: This increases complexity and may reduce readability. The current duplication is acceptable given the remote command constraints.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-hermes-inference-switch.sh` at line 360, The remote command
duplicates the curl invocation in two branches only to vary the Authorization
header; refactor by constructing the header string conditionally (based on
API_SERVER_KEY) and then call curl once using that header variable (preserve
tmp, code, rc, payload_arg, and the URL/http options). Locate the remote
variable assignment (remote="...") and replace the if/else duplication with:
build auth_hdr (empty or "Authorization: Bearer ${API_SERVER_KEY}"), then invoke
curl once with -H 'Content-Type: application/json' plus the auth header when
non-empty, keeping the same -sS -o "$tmp" -w '%{http_code}' --max-time 120
options and subsequent cat/rm/printf/exit logic.

58-60: 💤 Low value

Edge case: Response body containing the status marker.

If an API response body contains a line starting with __NEMOCLAW_HTTP_STATUS__=, that line will be incorrectly removed along with the actual marker. While unlikely in practice, a more defensive approach would delete only the last occurrence or use a more unique delimiter.

♻️ More robust alternative using head/tail
 http_body_from_response() {
-  sed '/^__NEMOCLAW_HTTP_STATUS__=/d' <<<"$1"
+  # Remove only the last line if it's the status marker
+  local response="$1"
+  local last_line
+  last_line=$(tail -1 <<<"$response")
+  if [[ "$last_line" =~ ^__NEMOCLAW_HTTP_STATUS__= ]]; then
+    head -n -1 <<<"$response"
+  else
+    printf '%s\n' "$response"
+  fi
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-hermes-inference-switch.sh` around lines 58 - 60, The helper
http_body_from_response currently removes every line that starts with the marker
__NEMOCLAW_HTTP_STATUS__= which will corrupt responses that legitimately contain
that line earlier; change the logic so only the final occurrence is removed.
Update http_body_from_response to remove exactly the last line that matches the
marker (for example by reversing the input, deleting the first match, and
reversing back, or by using an awk one-pass that tracks and skips the last
matching line) so the status marker is stripped but any earlier identical lines
in the response body are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/test-hermes-inference-switch.sh`:
- Line 360: The remote command duplicates the curl invocation in two branches
only to vary the Authorization header; refactor by constructing the header
string conditionally (based on API_SERVER_KEY) and then call curl once using
that header variable (preserve tmp, code, rc, payload_arg, and the URL/http
options). Locate the remote variable assignment (remote="...") and replace the
if/else duplication with: build auth_hdr (empty or "Authorization: Bearer
${API_SERVER_KEY}"), then invoke curl once with -H 'Content-Type:
application/json' plus the auth header when non-empty, keeping the same -sS -o
"$tmp" -w '%{http_code}' --max-time 120 options and subsequent
cat/rm/printf/exit logic.
- Around line 58-60: The helper http_body_from_response currently removes every
line that starts with the marker __NEMOCLAW_HTTP_STATUS__= which will corrupt
responses that legitimately contain that line earlier; change the logic so only
the final occurrence is removed. Update http_body_from_response to remove
exactly the last line that matches the marker (for example by reversing the
input, deleting the first match, and reversing back, or by using an awk one-pass
that tracks and skips the last matching line) so the status marker is stripped
but any earlier identical lines in the response body are preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5781d945-fd57-4ca8-8d9b-6c683dff551e

📥 Commits

Reviewing files that changed from the base of the PR and between 983d9f0 and f3e80ce.

📒 Files selected for processing (1)
  • test/e2e/test-hermes-inference-switch.sh

@cv
Copy link
Copy Markdown
Collaborator Author

cv commented May 24, 2026

Prepared for review after #4157 merged:

  • Retargeted PR base to main.
  • Addressed PR Review Advisor feedback by tracking transient state structurally from curl exit 28 or captured HTTP 502/503/504 status, rather than matching arbitrary response/model text.
  • Wrong-content 200 responses and unexpected HTTP statuses remain FAIL.
  • Marked the PR ready for review.

Validation:

Latest gh pr checks is green. GitHub still includes one older cancelled check run in the rollup from update churn, but the current checks are passing.

@cv cv added the v0.0.51 Release target label May 25, 2026
@cv cv merged commit 9059fe3 into main May 25, 2026
30 of 31 checks passed
@cv cv deleted the draft/hermes-inference-switch-live-timeout-skips branch May 27, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.51 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants