Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/issues/agent-tool-context-budget/plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,17 @@ consumes more context and fails on small formatting deviations.
- Judge tool-output continuation fitting against the next preflight-fitted request shape, so older
history that would be trimmed before the next provider call does not falsely fail the tool result.

## Retry Overflow Hardening

- Route unfittable provider-call preflight results through the existing context-pressure recovery
path once before failing.
- Keep the latest user/system/tool payload protected; recovery may compact persisted history,
replace stale summary-bearing system prompt text, trim older in-memory history, and reduce only
the per-call output cap.
- When recovery still cannot fit, fail before rate-limit wait/provider streaming with a budget
diagnostic that includes usable context, estimated input, tool-schema reserve, requested/effective
output, and remaining output room.

## Manual Validation Notes

For a MiniMax-M2.7 agent session, inspect trace/log output rather than running automated test
Expand Down
13 changes: 13 additions & 0 deletions docs/issues/agent-tool-context-budget/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,16 @@ Additional acceptance criteria:
instead of a positive budget.
- Context-pressure recovery updates the in-memory request history used by later tool-continuation
loops.

## Retry Overflow Hardening

Additional acceptance criteria:

- Retry and resume provider calls that still cannot fit after request fitting attempt the same
internal recovery pass before failing, even when the user configured fewer than 4000 output
tokens.
- If recovery cannot make the latest user/system/tool payload fit, DeepChat fails before sending a
provider request and stores a clear budget error on the assistant message.
- The error explains that no provider request was sent and suggests shortening current input or
attachments, reducing active tools/skills/system prompt content, lowering max output tokens, or
increasing context length.
2 changes: 2 additions & 0 deletions docs/issues/agent-tool-context-budget/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
- [x] Drop orphaned tool results and invalid provider options before AI SDK requests.
- [x] Report zero effective output tokens for unfittable preflight results.
- [x] Harden preflight for unknown context windows and refitted tool continuations.
- [x] Recover unfittable retry/resume preflights before provider calls.
- [x] Add actionable budget diagnostics for irreducible retry/resume overflow.
- [ ] Add request budget telemetry.
- [ ] Add reasoning retention budget.
- [ ] Add compact legacy tool schema mode.
19 changes: 19 additions & 0 deletions docs/issues/markdown-smooth-streaming-control/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Markdown Smooth Streaming Control Plan

## Approach

- Add a `smoothStreaming` prop to `MarkdownRenderer`, defaulting to `false`.
- Forward that prop to `markstream-vue`'s `NodeRenderer`.
- In chat message text rendering, enable the prop only when the assistant content block status is `pending` or `loading`.
- Do not derive this from parsed part loading state, because that state reflects artifact/tag parsing rather than message generation.

## Compatibility

- Existing non-chat markdown surfaces inherit the default `false`.
- Existing `fade=false` behavior remains unchanged.

## Validation

- Cover the markdown renderer default and explicit prop behavior.
- Cover completed versus generating message block behavior.
- Run targeted renderer tests plus formatting, i18n, and lint checks.
17 changes: 17 additions & 0 deletions docs/issues/markdown-smooth-streaming-control/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Markdown Smooth Streaming Control

## User Story

As a chat reader, I want completed markdown messages to render without streaming animation so that history and already-generated responses feel stable.

## Acceptance Criteria

- Assistant text blocks that are actively generating use `smoothStreaming`.
- Completed assistant text blocks do not use `smoothStreaming`.
- Markdown artifact previews and workspace markdown previews keep non-streaming behavior by default.
- The change does not alter markdown content parsing, references, code previews, or artifact syncing.

## Non-Goals

- No new user setting is added.
- No IPC, database, or shared message schema changes are needed.
8 changes: 8 additions & 0 deletions docs/issues/markdown-smooth-streaming-control/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Markdown Smooth Streaming Control Tasks

- [x] Document the intended behavior and implementation approach.
- [x] Add the `smoothStreaming` prop to `MarkdownRenderer`.
- [x] Enable smooth streaming only for pending/loading assistant content blocks.
- [x] Add renderer tests for default, explicit, completed, and generating states.
- [x] Run targeted renderer tests.
- [x] Run final format, i18n, and lint checks.
38 changes: 38 additions & 0 deletions src/main/presenter/agentRuntimePresenter/contextBudget.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,16 @@ export type RequestContextPreflightResult = {
requiresContextPressureRecovery: boolean
}

export type RequestContextBudgetDiagnostics = {
usableContextLength: number
inputTokens: number
toolReserveTokens: number
requestedMaxTokens: number
effectiveMaxTokens: number
remainingOutputTokens: number
totalRequestTokens: number
}

export function estimateToolReserveTokens(tools: MCPToolDefinition[]): number {
return estimateToolDefinitionTokens(tools)
}
Expand Down Expand Up @@ -195,6 +205,34 @@ export function preflightRequestContext(params: {
}
}

export function buildRequestContextBudgetDiagnostics(
preflight: RequestContextPreflightResult
): RequestContextBudgetDiagnostics {
return {
usableContextLength: preflight.usableContextLength,
inputTokens: preflight.inputTokens,
toolReserveTokens: preflight.toolReserveTokens,
requestedMaxTokens: preflight.requestedMaxTokens,
effectiveMaxTokens: preflight.effectiveMaxTokens,
remainingOutputTokens: preflight.remainingOutputTokens,
totalRequestTokens: preflight.totalRequestTokens
}
}

export function buildRequestContextOverflowErrorMessage(
preflight: RequestContextPreflightResult
): string {
const diagnostics = buildRequestContextBudgetDiagnostics(preflight)
const formatTokenCount = (value: number): string =>
Number.isFinite(value) ? String(Math.floor(value)) : 'unknown'

return [
'Request was not sent because it cannot fit within the model context window after applying the safety margin.',
`Budget: usable context ${formatTokenCount(diagnostics.usableContextLength)} tokens, estimated input ${formatTokenCount(diagnostics.inputTokens)} tokens, tool schemas ${formatTokenCount(diagnostics.toolReserveTokens)} tokens, requested output ${formatTokenCount(diagnostics.requestedMaxTokens)} tokens, effective output ${formatTokenCount(diagnostics.effectiveMaxTokens)} tokens, remaining output room ${formatTokenCount(diagnostics.remainingOutputTokens)} tokens.`,
'Try shortening the latest input or attachments, reducing active tools, skills, or system prompt content, lowering max output tokens, or increasing context length.'
].join(' ')
}

function resolveProtectedRequestTailCount(messages: ChatMessage[]): number {
if (messages.length === 0) {
return 0
Expand Down
10 changes: 6 additions & 4 deletions src/main/presenter/agentRuntimePresenter/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ import {
import {
capAgentDefaultMaxTokens,
capAgentRequestMaxTokens,
buildRequestContextOverflowErrorMessage,
estimateToolReserveTokens,
fitRequestMessagesToContextWindow,
preflightRequestContext
Expand Down Expand Up @@ -1874,7 +1875,10 @@ export class AgentRuntimePresenter implements IAgentImplementation {
requestedMaxTokens: requestMaxTokens,
minimumProtectedTailCount: protectedSteerTailCount
})
if (requestPreflight.requiresContextPressureRecovery) {
if (
requestPreflight.requiresContextPressureRecovery ||
!requestPreflight.fitsWithinContext
) {
const recovered = await recoverContextPressure({
sessionId,
providerId: state.providerId,
Expand Down Expand Up @@ -1903,9 +1907,7 @@ export class AgentRuntimePresenter implements IAgentImplementation {
requestMessages.splice(0, requestMessages.length, ...requestPreflight.messages)
}
if (!requestPreflight.fitsWithinContext) {
throw new Error(
'Request cannot fit within the model context window after applying the safety margin.'
)
throw new Error(buildRequestContextOverflowErrorMessage(requestPreflight))
}
await llmProviderPresenter.executeWithRateLimit(state.providerId, {
signal: abortController.signal,
Expand Down
22 changes: 15 additions & 7 deletions src/renderer/src/components/markdown/MarkdownRenderer.vue
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
:content="debouncedContent"
:custom-id="customRendererId"
:isDark="themeStore.isDark"
:smooth-streaming="smoothStreaming"
:fade="false"
:codeBlockDarkTheme="codeBlockDarkTheme"
:codeBlockLightTheme="codeBlockLightTheme"
:codeBlockMonacoOptions="codeBlockMonacoOption"
Expand Down Expand Up @@ -32,13 +34,19 @@ import LinkNode from './LinkNode.vue'
import { useMarkdownLinkNavigation } from './useMarkdownLinkNavigation'
import type { MarkdownLinkContext } from './linkTypes'

const props = defineProps<{
content: string
debug?: boolean
messageId?: string
threadId?: string
linkContext?: MarkdownLinkContext
}>()
const props = withDefaults(
defineProps<{
content: string
debug?: boolean
messageId?: string
threadId?: string
linkContext?: MarkdownLinkContext
smoothStreaming?: boolean
}>(),
{
smoothStreaming: false
}
)
const themeStore = useThemeStore()
const uiSettingsStore = useUiSettingsStore()
// 组件映射表
Expand Down
4 changes: 4 additions & 0 deletions src/renderer/src/components/message/MessageBlockContent.vue
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
v-if="part.type === 'text'"
:content="part.content"
:loading="part.loading"
:smooth-streaming="shouldSmoothStream"
:message-id="messageId"
:thread-id="threadId"
:link-context="{
Expand Down Expand Up @@ -54,6 +55,9 @@ const props = defineProps<{

const { processedContent } = useBlockContent(props)
const lastArtifactSnapshot = ref<string>('')
const shouldSmoothStream = computed(
() => props.block.status === 'pending' || props.block.status === 'loading'
)

const artifactSnapshot = computed(() =>
processedContent.value
Expand Down
Loading