fix(0.14.1): per-attempt fetch timeout + retry thrown network errors by drewstone · Pull Request #24 · tangle-network/agent-runtime

drewstone · 2026-05-20T19:49:44Z

Summary

A real eval persona burned 15 minutes on one hung request — the tcloud router accepted the connection, never responded, and the fetch sat open until the runtime gave up with fetch failed. Two gaps in createOpenAICompatibleBackend:

No per-attempt deadline — a hung upstream blocked indefinitely.
Thrown fetch errors weren't retried — only HTTP error statuses were; a thrown fetch failed killed the attempt.

Fix

BackendRetryPolicy.requestTimeoutMs (default 120s) — per-attempt AbortController deadline linked to the caller signal. Hung upstream aborts in 2min and retries.
The fetch call is wrapped — a thrown error is retried (backoff) like a 5xx. Caller aborts stay terminal. Exhausted retries → BackendTransportError.

Test plan

3 new tests (thrown-error retry, timeout-abort+retry, all-throw→error)
216 tests pass, typecheck + biome clean

Unblocks the eval run — without this one router hiccup kills a persona.

A production eval persona burned 15 minutes on a single hung request: the tcloud router accepted the connection, never responded, and the fetch sat open until the runtime gave up with `fetch failed`. Two gaps in createOpenAICompatibleBackend caused it: 1. No per-attempt deadline — a hung upstream blocked the attempt indefinitely. 2. Thrown fetch errors (network failure, DNS, the eventual `fetch failed`) propagated straight out of the retry loop. Only HTTP error *statuses* were retried; a thrown error killed the attempt. Fixes: - BackendRetryPolicy.requestTimeoutMs (default 120s) — each attempt gets an AbortController deadline linked to the caller signal. A hung upstream now aborts in 2 min and retries instead of hanging. - The fetch call is wrapped: a thrown error is treated as a retryable transport failure (backoff + retry) just like a 5xx. Caller-initiated aborts stay terminal. Exhausted retries throw BackendTransportError with the last error message. 3 new tests: thrown-error retry, per-attempt-timeout abort + retry, all-attempts-throw → BackendTransportError. 216 tests pass.

drewstone added 2 commits May 20, 2026 22:49

ci: biome — fix assign-in-expression in chat-engine test

4c1385a

drewstone merged commit 9500733 into main May 20, 2026
1 check passed

drewstone deleted the fix/backend-timeout branch May 20, 2026 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(0.14.1): per-attempt fetch timeout + retry thrown network errors#24

fix(0.14.1): per-attempt fetch timeout + retry thrown network errors#24
drewstone merged 2 commits into
mainfrom
fix/backend-timeout

drewstone commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drewstone commented May 20, 2026

Summary

Fix

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant