Skip to content

fix(gateway): split actor response wait errors#4945

Merged
NathanFlurry merged 1 commit intomainfrom
05-04-fix_gateway_specific_response_errors
May 5, 2026
Merged

fix(gateway): split actor response wait errors#4945
NathanFlurry merged 1 commit intomainfrom
05-04-fix_gateway_specific_response_errors

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

This was referenced May 4, 2026
Copy link
Copy Markdown
Member Author

NathanFlurry commented May 4, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

Code Review: fix(gateway): split actor response wait errors

Overview

This PR replaces the generic ServiceUnavailable error with five distinct, semantically precise error types for actor response wait failures in the gateway. It also improves the retry logic and TypeScript client classification, and adds documentation for all new error codes.


Strengths

  • Clear error semantics: Splitting a single catch-all into ActorStoppedWhileWaiting, TunnelRequestAborted, TunnelMessageTimeout, TunnelResponseClosed, and GatewayResponseStartTimeout makes the failure mode visible in logs and to API consumers.
  • Bug fix in TunnelResponseClosed: The old break in the None select arm (both gateways) silently fell through to a generic ServiceUnavailable; the new return Err(TunnelResponseClosed.build()) is explicit and more precise.
  • Stricter should_retry_request_inner: The new implementation validates the x-rivet-error header content rather than just checking its presence, preventing spurious retries on unrelated 503s.
  • TypeScript test added verifying the retryable classification of all new codes.
  • Documentation updated with the new error codes.

Issues

1. Missing newline at end of file

Every new .json file under engine/artifacts/errors/ ends without a trailing newline. All existing artifacts in that directory include a trailing newline; these should match.

2. actor_ready_timeout inconsistency between Rust and TypeScript

The Rust helper includes actor_ready_timeout as retryable but the TypeScript counterpart does not. This means guard.actor_ready_timeout is retried by the gateway internal Rust retry loop but not by the TypeScript client. If the intent is for the TS client to retry on it too, add the code; if not, remove it from the Rust list or document why the two diverge.

3. No Rust unit tests

The Rust-side changes are meaningfully testable:

  • is_retryable_guard_http_error is a pure function, so a small table-driven test would guard against future regression.
  • should_retry_request_inner now has non-trivial header parsing logic; tests covering retryable vs. non-retryable cases (missing header, wrong group, unknown code, 504 with valid code) would be valuable.

4. Duplicate retryable-codes list

The list of retryable guard codes now lives in three places: Rust (utils.rs), TypeScript (lifecycle-errors.ts), and implicitly in the documentation. A future error addition must be updated in all three simultaneously. A brief inline cross-reference comment would reduce the risk of drift.


Minor

  • The new comment on should_retry_request_inner uses a colon-fragment style. CLAUDE.md asks for complete sentences.
  • actor_ready_timeout is absent from the documentation new error-code list even though it was already present in the original paragraph.

Summary

The core change is solid: cleaner error taxonomy, a real bug fix in the None/break path, and good test coverage on the TypeScript side. The main actionable items before merge are: fix the trailing newlines in the JSON artifacts, resolve the actor_ready_timeout inconsistency between Rust and TypeScript, and consider adding Rust unit tests for the new retry helpers.

@NathanFlurry NathanFlurry force-pushed the 05-04-fix_guard_bypass_cors_preflight branch from 0933767 to a4663b0 Compare May 4, 2026 15:36
@NathanFlurry NathanFlurry force-pushed the 05-04-fix_gateway_specific_response_errors branch from 5ad2b7b to 82f5c9d Compare May 4, 2026 15:36
Base automatically changed from 05-04-fix_guard_bypass_cors_preflight to main May 5, 2026 14:57
@NathanFlurry NathanFlurry merged commit 82f5c9d into main May 5, 2026
22 of 28 checks passed
@NathanFlurry NathanFlurry deleted the 05-04-fix_gateway_specific_response_errors branch May 5, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant