ThinkInAIXYZ · zerob13 · May 11, 2026 · May 11, 2026
diff --git a/docs/issues/agent-tool-context-budget/plan.md b/docs/issues/agent-tool-context-budget/plan.md
@@ -57,6 +57,17 @@ consumes more context and fails on small formatting deviations.
 - Judge tool-output continuation fitting against the next preflight-fitted request shape, so older
   history that would be trimmed before the next provider call does not falsely fail the tool result.
 
+## Retry Overflow Hardening
+
+- Route unfittable provider-call preflight results through the existing context-pressure recovery
+  path once before failing.
+- Keep the latest user/system/tool payload protected; recovery may compact persisted history,
+  replace stale summary-bearing system prompt text, trim older in-memory history, and reduce only
+  the per-call output cap.
+- When recovery still cannot fit, fail before rate-limit wait/provider streaming with a budget
+  diagnostic that includes usable context, estimated input, tool-schema reserve, requested/effective
+  output, and remaining output room.
+
 ## Manual Validation Notes
 
 For a MiniMax-M2.7 agent session, inspect trace/log output rather than running automated test

diff --git a/docs/issues/agent-tool-context-budget/spec.md b/docs/issues/agent-tool-context-budget/spec.md
@@ -72,3 +72,16 @@ Additional acceptance criteria:
   instead of a positive budget.
 - Context-pressure recovery updates the in-memory request history used by later tool-continuation
   loops.
+
+## Retry Overflow Hardening
+
+Additional acceptance criteria:
+
+- Retry and resume provider calls that still cannot fit after request fitting attempt the same
+  internal recovery pass before failing, even when the user configured fewer than 4000 output
+  tokens.
+- If recovery cannot make the latest user/system/tool payload fit, DeepChat fails before sending a
+  provider request and stores a clear budget error on the assistant message.
+- The error explains that no provider request was sent and suggests shortening current input or
+  attachments, reducing active tools/skills/system prompt content, lowering max output tokens, or
+  increasing context length.
diff --git a/docs/issues/agent-tool-context-budget/tasks.md b/docs/issues/agent-tool-context-budget/tasks.md
@@ -13,6 +13,8 @@
 - [x] Drop orphaned tool results and invalid provider options before AI SDK requests.
 - [x] Report zero effective output tokens for unfittable preflight results.
 - [x] Harden preflight for unknown context windows and refitted tool continuations.
+- [x] Recover unfittable retry/resume preflights before provider calls.
+- [x] Add actionable budget diagnostics for irreducible retry/resume overflow.
 - [ ] Add request budget telemetry.
 - [ ] Add reasoning retention budget.
 - [ ] Add compact legacy tool schema mode.
diff --git a/docs/issues/markdown-smooth-streaming-control/plan.md b/docs/issues/markdown-smooth-streaming-control/plan.md
@@ -0,0 +1,19 @@
+# Markdown Smooth Streaming Control Plan
+
+## Approach
+
+- Add a `smoothStreaming` prop to `MarkdownRenderer`, defaulting to `false`.
+- Forward that prop to `markstream-vue`'s `NodeRenderer`.
+- In chat message text rendering, enable the prop only when the assistant content block status is `pending` or `loading`.
+- Do not derive this from parsed part loading state, because that state reflects artifact/tag parsing rather than message generation.
+
+## Compatibility
+
+- Existing non-chat markdown surfaces inherit the default `false`.
+- Existing `fade=false` behavior remains unchanged.
+
+## Validation
+
+- Cover the markdown renderer default and explicit prop behavior.
+- Cover completed versus generating message block behavior.
+- Run targeted renderer tests plus formatting, i18n, and lint checks.
diff --git a/docs/issues/markdown-smooth-streaming-control/spec.md b/docs/issues/markdown-smooth-streaming-control/spec.md
@@ -0,0 +1,17 @@
+# Markdown Smooth Streaming Control
+
+## User Story
+
+As a chat reader, I want completed markdown messages to render without streaming animation so that history and already-generated responses feel stable.
+
+## Acceptance Criteria
+
+- Assistant text blocks that are actively generating use `smoothStreaming`.
+- Completed assistant text blocks do not use `smoothStreaming`.
+- Markdown artifact previews and workspace markdown previews keep non-streaming behavior by default.
+- The change does not alter markdown content parsing, references, code previews, or artifact syncing.
+
+## Non-Goals
+
+- No new user setting is added.
+- No IPC, database, or shared message schema changes are needed.
diff --git a/docs/issues/markdown-smooth-streaming-control/tasks.md b/docs/issues/markdown-smooth-streaming-control/tasks.md
@@ -0,0 +1,8 @@
+# Markdown Smooth Streaming Control Tasks
+
+- [x] Document the intended behavior and implementation approach.
+- [x] Add the `smoothStreaming` prop to `MarkdownRenderer`.
+- [x] Enable smooth streaming only for pending/loading assistant content blocks.
+- [x] Add renderer tests for default, explicit, completed, and generating states.
+- [x] Run targeted renderer tests.
+- [x] Run final format, i18n, and lint checks.
diff --git a/src/main/presenter/agentRuntimePresenter/contextBudget.ts b/src/main/presenter/agentRuntimePresenter/contextBudget.ts
@@ -32,6 +32,16 @@ export type RequestContextPreflightResult = {
   requiresContextPressureRecovery: boolean
 }
 
+export type RequestContextBudgetDiagnostics = {
+  usableContextLength: number
+  inputTokens: number
+  toolReserveTokens: number
+  requestedMaxTokens: number
+  effectiveMaxTokens: number
+  remainingOutputTokens: number
+  totalRequestTokens: number
+}
+
 export function estimateToolReserveTokens(tools: MCPToolDefinition[]): number {
   return estimateToolDefinitionTokens(tools)
 }
@@ -195,6 +205,34 @@ export function preflightRequestContext(params: {
   }
 }
 
+export function buildRequestContextBudgetDiagnostics(
+  preflight: RequestContextPreflightResult
+): RequestContextBudgetDiagnostics {
+  return {
+    usableContextLength: preflight.usableContextLength,
+    inputTokens: preflight.inputTokens,
+    toolReserveTokens: preflight.toolReserveTokens,
+    requestedMaxTokens: preflight.requestedMaxTokens,
+    effectiveMaxTokens: preflight.effectiveMaxTokens,
+    remainingOutputTokens: preflight.remainingOutputTokens,
+    totalRequestTokens: preflight.totalRequestTokens
+  }
+}
+
+export function buildRequestContextOverflowErrorMessage(
+  preflight: RequestContextPreflightResult
+): string {
+  const diagnostics = buildRequestContextBudgetDiagnostics(preflight)
+  const formatTokenCount = (value: number): string =>
+    Number.isFinite(value) ? String(Math.floor(value)) : 'unknown'
+
+  return [
+    'Request was not sent because it cannot fit within the model context window after applying the safety margin.',
+    `Budget: usable context ${formatTokenCount(diagnostics.usableContextLength)} tokens, estimated input ${formatTokenCount(diagnostics.inputTokens)} tokens, tool schemas ${formatTokenCount(diagnostics.toolReserveTokens)} tokens, requested output ${formatTokenCount(diagnostics.requestedMaxTokens)} tokens, effective output ${formatTokenCount(diagnostics.effectiveMaxTokens)} tokens, remaining output room ${formatTokenCount(diagnostics.remainingOutputTokens)} tokens.`,
+    'Try shortening the latest input or attachments, reducing active tools, skills, or system prompt content, lowering max output tokens, or increasing context length.'
+  ].join(' ')
+}
+
 function resolveProtectedRequestTailCount(messages: ChatMessage[]): number {
   if (messages.length === 0) {
     return 0

diff --git a/src/main/presenter/agentRuntimePresenter/index.ts b/src/main/presenter/agentRuntimePresenter/index.ts
@@ -75,6 +75,7 @@ import {
 import {
   capAgentDefaultMaxTokens,
   capAgentRequestMaxTokens,
+  buildRequestContextOverflowErrorMessage,
   estimateToolReserveTokens,
   fitRequestMessagesToContextWindow,
   preflightRequestContext
@@ -1874,7 +1875,10 @@ export class AgentRuntimePresenter implements IAgentImplementation {
               requestedMaxTokens: requestMaxTokens,
               minimumProtectedTailCount: protectedSteerTailCount
             })
-            if (requestPreflight.requiresContextPressureRecovery) {
+            if (
+              requestPreflight.requiresContextPressureRecovery ||
+              !requestPreflight.fitsWithinContext
+            ) {
               const recovered = await recoverContextPressure({
                 sessionId,
                 providerId: state.providerId,
@@ -1903,9 +1907,7 @@ export class AgentRuntimePresenter implements IAgentImplementation {
               requestMessages.splice(0, requestMessages.length, ...requestPreflight.messages)
             }
             if (!requestPreflight.fitsWithinContext) {
-              throw new Error(
-                'Request cannot fit within the model context window after applying the safety margin.'
-              )
+              throw new Error(buildRequestContextOverflowErrorMessage(requestPreflight))
             }
             await llmProviderPresenter.executeWithRateLimit(state.providerId, {
               signal: abortController.signal,

diff --git a/src/renderer/src/components/markdown/MarkdownRenderer.vue b/src/renderer/src/components/markdown/MarkdownRenderer.vue
@@ -4,6 +4,8 @@
       :content="debouncedContent"
       :custom-id="customRendererId"
       :isDark="themeStore.isDark"
+      :smooth-streaming="smoothStreaming"
+      :fade="false"
       :codeBlockDarkTheme="codeBlockDarkTheme"
       :codeBlockLightTheme="codeBlockLightTheme"
       :codeBlockMonacoOptions="codeBlockMonacoOption"
@@ -32,13 +34,19 @@ import LinkNode from './LinkNode.vue'
 import { useMarkdownLinkNavigation } from './useMarkdownLinkNavigation'
 import type { MarkdownLinkContext } from './linkTypes'
 
-const props = defineProps<{
-  content: string
-  debug?: boolean
-  messageId?: string
-  threadId?: string
-  linkContext?: MarkdownLinkContext
-}>()
+const props = withDefaults(
+  defineProps<{
+    content: string
+    debug?: boolean
+    messageId?: string
+    threadId?: string
+    linkContext?: MarkdownLinkContext
+    smoothStreaming?: boolean
+  }>(),
+  {
+    smoothStreaming: false
+  }
+)
 const themeStore = useThemeStore()
 const uiSettingsStore = useUiSettingsStore()
 // 组件映射表

diff --git a/src/renderer/src/components/message/MessageBlockContent.vue b/src/renderer/src/components/message/MessageBlockContent.vue
@@ -6,6 +6,7 @@
       v-if="part.type === 'text'"
       :content="part.content"
       :loading="part.loading"
+      :smooth-streaming="shouldSmoothStream"
       :message-id="messageId"
       :thread-id="threadId"
       :link-context="{
@@ -54,6 +55,9 @@ const props = defineProps<{
 
 const { processedContent } = useBlockContent(props)
 const lastArtifactSnapshot = ref<string>('')
+const shouldSmoothStream = computed(
+  () => props.block.status === 'pending' || props.block.status === 'loading'
+)
 
 const artifactSnapshot = computed(() =>
   processedContent.value