fix(llm): stream structured fallback and expose fetch error cause#49
Open
cloud5418 wants to merge 16 commits into
Open
fix(llm): stream structured fallback and expose fetch error cause#49cloud5418 wants to merge 16 commits into
cloud5418 wants to merge 16 commits into
Conversation
- require signed public desktop release workflow - add trusted GitHub feed and minimum version gates - validate staged updater metadata and add desktop security tests
- persist minimum update version into packaged runtime config - treat prerelease versions as below the matching stable floor - remove deprecated desktop-v public release path
- remove polluted .gitignore entry `m[[]0])` (between .env.local and dev.db) - ignore .cursor/, *.log, tmp/ to prevent IDE/log noise from leaking - git rm --cached .codex-backups/ and .cursor/ (kept on disk) - delete root-level empty .codex placeholder file
Equivalent re-implementation of fork c6d30e0's headers feature on top of upstream main. The original commit's task-execution-log routing fix is intentionally dropped because upstream's auto-director rewrite removed the /:taskId vs /execution-logs collision that the fix targeted. - ModelRouteConfig.requestHeadersText column with prisma + sqlite migrations - Server: parseRequestHeadersText utility, threaded through ResolvedModel and resolveLLMClientOptions; applied at anthropicClient, connectivity, factory (defaultHeaders for OpenAI), structuredInvoke, routes/llm - Client: textarea on settings page (per-route + bulk), with deferred connectivity probing carried over from the same upstream commit - Tests: parser unit, modelRouter user override, llmProviders parsing - Release notes: 2026-05-09 entry
Cherry-picked from PR ExplosiveCoderflome#21 squash (e2800a4). Standalone file with no upstream collision; the rest of e2800a4 (README cleanup, package.json dev:log removal) is no longer applicable on top of upstream main.
Set open-pull-requests-limit to 0 temporarily. Dependabot's internal scan state is locked to the pre-sync default branch (chore/dependabot-2026-05-09); all PRs it opens are ahead 30 / behind 148 vs main. Pausing prevents new stale-baseline PRs while we clean the queue, then this commit will be reverted.
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
This reverts commit 94b6859.
Bumps the typescript-tooling group with 2 updates: [typescript](https://github.com/microsoft/TypeScript) and [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node). Updates `typescript` from 5.9.3 to 6.0.3 - [Release notes](https://github.com/microsoft/TypeScript/releases) - [Commits](microsoft/TypeScript@v5.9.3...v6.0.3) Updates `@types/node` from 25.3.3 to 25.6.2 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) --- updated-dependencies: - dependency-name: typescript dependency-version: 6.0.3 dependency-type: direct:development update-type: version-update:semver-major dependency-group: typescript-tooling - dependency-name: "@types/node" dependency-version: 25.6.2 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: typescript-tooling ... Signed-off-by: dependabot[bot] <support@github.com>
…ithub_actions/actions/checkout-6 ci(deps)(deps): bump actions/checkout from 4 to 6
…pm_and_yarn/typescript-tooling-d818b2e23e chore(deps)(deps-dev): bump the typescript-tooling group with 2 updates
…ured-protocol-governance-desktop-fix fix: harden desktop structured auto-director fallback
- direct-transport prompt_json fallback now sends `stream: true` and parses Server-Sent Events; this bypasses third-party aggregator proxies that cut idle non-stream connections after ~125s, which was causing volume-skeleton (and similar long-output) requests to fail with `[STRUCTURED_OUTPUT:transport_error] ... timed out after 200000ms` even though the model was still reasoning. - response handling keeps the existing JSON path as a fallback when the upstream ignores `stream:true` (no `text/event-stream` content type), so providers that do not support streaming still work. - when a low-level `fetch failed` happens, walk `error.cause` to expose undici-style fields (`code`/`errno`/`syscall`/`host`) in the user- visible message instead of bare "fetch failed". - `StructuredOutputError` now accepts/forwards a `cause`, and `summarizeStructuredOutputFailure` appends the parsed cause + a hint to retry/check upstream when category is `transport_error`. Verified end-to-end against an OpenAI-compatible aggregator: - non-stream 10-volume skeleton request: socket cut at 125s, 0 bytes. - streamed request (this change): 154s total, 1.0MB SSE, finish_reason stop, 10 volumes parsed. - `invokeStructuredLlmDetailed` runs the full strategy sequence and the stream-mode fallback returns a complete result in 137.6s where prior runs hit the 200s fallback timeout. `pnpm --filter @ai-novel/server typecheck` clean. Relevant fallback unit tests (story_macro / director_book_contract / director_candidate / director_recovery_sample_audit) still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Long-output structured invocations (e.g. `novel.volume.skeleton@v2`, volume strategy planning, long chapter macro plans) failed with:
```
[STRUCTURED_OUTPUT:transport_error] [novel.volume.skeleton@v2.fallback] Request timed out after 200000ms.
```
Root cause: third-party OpenAI-compatible aggregator proxies (commonly used to serve gpt-5.x reasoning models in China) cut idle non-stream connections after ~125s. The reasoning + JSON generation regularly exceeds that window for prompts that produce >5KB output, so the request times out without a single response byte and the application can only report a bare `fetch failed` after its own fallback timeout fires.
A separate but related issue: when the underlying `fetch failed` did have a useful Node `error.cause` (e.g. `code=ENOTFOUND`, `UND_ERR_SOCKET`, `ECONNRESET`), the existing classifier dropped it and the user only saw `[STRUCTURED_OUTPUT:transport_error] fetch failed`, making diagnosis impossible.
Fix
Stream the direct-transport fallback
`invokeStructuredPromptJsonViaDirectOpenAICompatible` (the prompt_json fallback path that runs raw `fetch` without LangChain) now:
Because chunks keep the socket alive, the 125s idle cap no longer trips on long generations.
Expose `error.cause`
Verification
End-to-end against the production-style aggregator (sub2api → gpt-5.4):
`pnpm --filter @ai-novel/server typecheck` clean.
Relevant unit tests pass (these were the ones touching the transport fallback path):
Compatibility / Risk
Why now
Reported failures on `novel.volume.skeleton@v2` and earlier `auto-director` rhythm/chapter-split stages in production. With this change, long-running structured invocations against proxied gpt-5.x reasoning models can complete instead of timing out at the proxy idle limit.