Skip to content

fix(llm): stream structured fallback and expose fetch error cause#49

Open
cloud5418 wants to merge 16 commits into
ExplosiveCoderflome:mainfrom
cloud5418:fix/structured-fallback-streaming-and-cause
Open

fix(llm): stream structured fallback and expose fetch error cause#49
cloud5418 wants to merge 16 commits into
ExplosiveCoderflome:mainfrom
cloud5418:fix/structured-fallback-streaming-and-cause

Conversation

@cloud5418
Copy link
Copy Markdown

Problem

Long-output structured invocations (e.g. `novel.volume.skeleton@v2`, volume strategy planning, long chapter macro plans) failed with:

```
[STRUCTURED_OUTPUT:transport_error] [novel.volume.skeleton@v2.fallback] Request timed out after 200000ms.
```

Root cause: third-party OpenAI-compatible aggregator proxies (commonly used to serve gpt-5.x reasoning models in China) cut idle non-stream connections after ~125s. The reasoning + JSON generation regularly exceeds that window for prompts that produce >5KB output, so the request times out without a single response byte and the application can only report a bare `fetch failed` after its own fallback timeout fires.

A separate but related issue: when the underlying `fetch failed` did have a useful Node `error.cause` (e.g. `code=ENOTFOUND`, `UND_ERR_SOCKET`, `ECONNRESET`), the existing classifier dropped it and the user only saw `[STRUCTURED_OUTPUT:transport_error] fetch failed`, making diagnosis impossible.

Fix

Stream the direct-transport fallback

`invokeStructuredPromptJsonViaDirectOpenAICompatible` (the prompt_json fallback path that runs raw `fetch` without LangChain) now:

  • sends `stream: true` in the request body
  • parses Server-Sent Events into the same string the existing parser consumes
  • falls back to JSON parsing when the upstream ignores `stream: true` (i.e., the response `content-type` is not `text/event-stream`/`application/x-ndjson`)

Because chunks keep the socket alive, the 125s idle cap no longer trips on long generations.

Expose `error.cause`

  • New `describeNetworkErrorCause` walks up to 6 levels of `error.cause` to extract undici/Node fields (`code`/`errno`/`syscall`/`hostname`/`address`/`port`).
  • New `enrichErrorMessageWithCause` appends `(cause: code=…)` to the message when one is found and not already present.
  • `wrapStructuredInvokeError` uses `enrichErrorMessageWithCause` instead of bare `error.message` and forwards the original error as the `cause` of `StructuredOutputError`.
  • `summarizeStructuredOutputFailure` now appends the parsed cause (and a "check upstream / proxy" hint) when category is `transport_error`.

Verification

End-to-end against the production-style aggregator (sub2api → gpt-5.4):

Scenario Result
Non-stream 10-volume skeleton POST socket cut at 125s, 0 bytes (curl 56)
Streamed 10-volume skeleton POST (this change) 154s, 1.0MB SSE, finish_reason=stop, 10 volumes parsed
`invokeStructuredLlmDetailed` full strategy seq strategy[0/1] aborted at ~43s as before, fallback (this change) completes in 137.6s with full 10-volume JSON; previous code hit 200s fallback timeout

`pnpm --filter @ai-novel/server typecheck` clean.

Relevant unit tests pass (these were the ones touching the transport fallback path):

  • `tests/storyMacroFallback.test.js`
  • `tests/directorBookContractFallback.test.js`
  • `tests/directorCandidateFallback.test.js`
  • `tests/directorRecoverySampleAudit.test.js`

Compatibility / Risk

  • Adding `stream: true` is OpenAI-protocol-standard; aggregators tested (sub2api, ccp) accept it. Providers that ignore the flag still return JSON and the existing JSON path handles them.
  • SSE parsing is contained to the fallback transport. The primary LangChain `ChatOpenAI` invocation is unchanged.
  • The `cause` propagation is additive — message strings are only extended, never replaced. The existing classifier still returns `transport_error` for the same inputs.

Why now

Reported failures on `novel.volume.skeleton@v2` and earlier `auto-director` rhythm/chapter-split stages in production. With this change, long-running structured invocations against proxied gpt-5.x reasoning models can complete instead of timing out at the proxy idle limit.

cloud5418 and others added 16 commits May 9, 2026 15:51
- require signed public desktop release workflow
- add trusted GitHub feed and minimum version gates
- validate staged updater metadata and add desktop security tests
- persist minimum update version into packaged runtime config
- treat prerelease versions as below the matching stable floor
- remove deprecated desktop-v public release path
- remove polluted .gitignore entry `m[[]0])` (between .env.local and dev.db)
- ignore .cursor/, *.log, tmp/ to prevent IDE/log noise from leaking
- git rm --cached .codex-backups/ and .cursor/ (kept on disk)
- delete root-level empty .codex placeholder file
Equivalent re-implementation of fork c6d30e0's headers feature on top of
upstream main. The original commit's task-execution-log routing fix is
intentionally dropped because upstream's auto-director rewrite removed
the /:taskId vs /execution-logs collision that the fix targeted.

- ModelRouteConfig.requestHeadersText column with prisma + sqlite migrations
- Server: parseRequestHeadersText utility, threaded through ResolvedModel
  and resolveLLMClientOptions; applied at anthropicClient, connectivity,
  factory (defaultHeaders for OpenAI), structuredInvoke, routes/llm
- Client: textarea on settings page (per-route + bulk), with deferred
  connectivity probing carried over from the same upstream commit
- Tests: parser unit, modelRouter user override, llmProviders parsing
- Release notes: 2026-05-09 entry
Cherry-picked from PR ExplosiveCoderflome#21 squash (e2800a4). Standalone file with no
upstream collision; the rest of e2800a4 (README cleanup, package.json
dev:log removal) is no longer applicable on top of upstream main.
Set open-pull-requests-limit to 0 temporarily. Dependabot's internal
scan state is locked to the pre-sync default branch (chore/dependabot-2026-05-09);
all PRs it opens are ahead 30 / behind 148 vs main. Pausing prevents
new stale-baseline PRs while we clean the queue, then this commit will
be reverted.
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the typescript-tooling group with 2 updates: [typescript](https://github.com/microsoft/TypeScript) and [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node).


Updates `typescript` from 5.9.3 to 6.0.3
- [Release notes](https://github.com/microsoft/TypeScript/releases)
- [Commits](microsoft/TypeScript@v5.9.3...v6.0.3)

Updates `@types/node` from 25.3.3 to 25.6.2
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node)

---
updated-dependencies:
- dependency-name: typescript
  dependency-version: 6.0.3
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: typescript-tooling
- dependency-name: "@types/node"
  dependency-version: 25.6.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: typescript-tooling
...

Signed-off-by: dependabot[bot] <support@github.com>
…ithub_actions/actions/checkout-6

ci(deps)(deps): bump actions/checkout from 4 to 6
…pm_and_yarn/typescript-tooling-d818b2e23e

chore(deps)(deps-dev): bump the typescript-tooling group with 2 updates
…ured-protocol-governance-desktop-fix

fix: harden desktop structured auto-director fallback
- direct-transport prompt_json fallback now sends `stream: true` and
  parses Server-Sent Events; this bypasses third-party aggregator
  proxies that cut idle non-stream connections after ~125s, which was
  causing volume-skeleton (and similar long-output) requests to fail
  with `[STRUCTURED_OUTPUT:transport_error] ... timed out after
  200000ms` even though the model was still reasoning.
- response handling keeps the existing JSON path as a fallback when the
  upstream ignores `stream:true` (no `text/event-stream` content type),
  so providers that do not support streaming still work.
- when a low-level `fetch failed` happens, walk `error.cause` to expose
  undici-style fields (`code`/`errno`/`syscall`/`host`) in the user-
  visible message instead of bare "fetch failed".
- `StructuredOutputError` now accepts/forwards a `cause`, and
  `summarizeStructuredOutputFailure` appends the parsed cause + a hint
  to retry/check upstream when category is `transport_error`.

Verified end-to-end against an OpenAI-compatible aggregator:
- non-stream 10-volume skeleton request: socket cut at 125s, 0 bytes.
- streamed request (this change): 154s total, 1.0MB SSE, finish_reason
  stop, 10 volumes parsed.
- `invokeStructuredLlmDetailed` runs the full strategy sequence and
  the stream-mode fallback returns a complete result in 137.6s where
  prior runs hit the 200s fallback timeout.

`pnpm --filter @ai-novel/server typecheck` clean.
Relevant fallback unit tests (story_macro / director_book_contract /
director_candidate / director_recovery_sample_audit) still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant