fix: 移除流式响应中MiniMax思考内容的输出 by alei37 · Pull Request #7314 · AstrBotDevs/AstrBot

alei37 · 2026-04-02T16:50:29Z

问题描述

使用 MiniMax（基于 OpenAI URL 格式的 API）在开启流式响应时，模型的思考内容会被当作正文一起输出给用户。关闭流式响应则正常只输出正文。

根本原因

MiniMax 模型思考内容的发送格式

MiniMax 模型在流式响应时，会将思考内容以 <think>...</think> 标签的形式嵌入在 delta.content 中。例如：

Chunk 1: <think>\n用户
Chunk 2: 地回复...</think>\n\n晚上好！

思考内容跨越多个 chunk 传输时，<think> 和 </think> 标签会被分割到不同的 chunk 中。

3. 流式响应的缺陷

在 _query_stream 方法（流式路径）中，原有代码直接发送 delta.content，没有处理思考标签：

if delta and delta.content:
    completion_text = self._normalize_content(delta.content, strip=False)
    llm_response.result_chain = MessageChain(chain=[Comp.Plain(completion_text)])
    _y = True

当思考标签被分割到不同 chunk 时：

Chunk 1: <think>\n用户 → 无法匹配完整正则 → 直接发送
Chunk 2: 地回复...</think>\n\n晚上好 → 无法匹配完整正则 → 直接发送

结果：<think>\n用户地回复...</think>\n\n晚上好 被完整发送给用户。

Modifications / 改动点

核心思路

在流式处理中维护一个 thinking_buffer，追踪跨 chunk 的不完整思考标签。

修改文件

astrbot/core/provider/sources/openai_source.py

修改内容

添加思考缓冲变量（在 _query_stream 方法开始处）：

state = ChatCompletionStreamState()

# Track partial thinking tags across chunks for MiniMax-style reasoning
thinking_buffer = ""
in_thinking_block = False

async for chunk in stream:

重构 delta.content 处理逻辑：

if delta and delta.content:
    completion_text = self._normalize_content(delta.content, strip=False)

    # Handle partial <think>...</think> tags that may span multiple chunks (MiniMax)
    if thinking_buffer:
        completion_text = thinking_buffer + completion_text
        thinking_buffer = ""

    thinking_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)

    # Extract complete thinking blocks
    for match in thinking_pattern.finditer(completion_text):
        think_content = match.group(1).strip()
        if think_content:
            if llm_response.reasoning_content:
                llm_response.reasoning_content += "\n" + think_content
            else:
                llm_response.reasoning_content = think_content

    # Remove complete thinking blocks
    completion_text = thinking_pattern.sub("", completion_text)

    # Handle incomplete thinking block at chunk boundary
    think_start = completion_text.rfind("<think>")
    think_end = completion_text.rfind("</think>")

    if think_start != -1 and (think_end == -1 or think_end < think_start):
        # Unclosed opening tag found, buffer it
        thinking_buffer = completion_text[think_start:]
        completion_text = completion_text[:think_start]
    elif think_end != -1 and think_end > think_start:
        # Thinking block closed, clear buffer
        thinking_buffer = ""

    completion_text = completion_text.strip()

    if completion_text:
        llm_response.result_chain = MessageChain(chain=[Comp.Plain(completion_text)])
        _y = True

处理流程示例

输入 Chunk 1: <think>\n用户
  → 发现未闭合的 <think>，buffer = "<think>\n用户"
  → yield ""

输入 Chunk 2: 地回复...</think>\n\n晚上好
  → 拼接 buffer: "<think>\n用户地回复...</think>\n\n晚上好"
  → 提取思考内容: "用户地回复..."
  → 移除思考标签后: "晚上好"
  → yield "晚上好"

影响范围

受影响的 Provider

继承自 ProviderOpenAIOfficial 的所有 Provider（使用 OpenAI 兼容格式）：

向后兼容性

如果 Provider 不发送 <think>...</think> 格式的思考内容，此修改不会产生任何影响
如果 Provider 在非流式模式下已有思考标签处理逻辑，流式模式下的行为现在保持一致
This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

关联问题

#7013
#6647
#6745

Checklist / 检查清单

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

Summary by Sourcery

Handle MiniMax-style reasoning tags in OpenAI-compatible streaming responses to prevent internal thinking content from being sent to end users while still capturing it as structured reasoning metadata.

New Features:

Capture <think>...</think> reasoning segments from streaming responses into reasoning_content for OpenAI-compatible providers using MiniMax-style thinking tags.

Bug Fixes:

Prevent MiniMax thinking content wrapped in <think>...</think> from being emitted to users in streaming responses, even when tags are split across chunks.

Enhancements:

Introduce chunk-level buffering and parsing of thinking tags in _query_stream to align streaming behavior with non-streaming responses.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The newly introduced in_thinking_block variable is never used and should either be wired into the logic or removed to avoid confusion about the stream parsing state.
The thinking_pattern = re.compile(...) is created on every streamed chunk; consider moving this to a module- or class-level constant to avoid repeated compilation in long-running streams.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The newly introduced `in_thinking_block` variable is never used and should either be wired into the logic or removed to avoid confusion about the stream parsing state.
- The `thinking_pattern = re.compile(...)` is created on every streamed chunk; consider moving this to a module- or class-level constant to avoid repeated compilation in long-running streams.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="609-613" />
<code_context>
+                    # We closed a thinking block, clear any buffered content
+                    thinking_buffer = ""
+                
+                # Strip whitespace but preserve structure
+                completion_text = completion_text.strip()
+                
+                # Only yield if there's actual text content remaining
+                if completion_text:
+                    llm_response.result_chain = MessageChain(
+                        chain=[Comp.Plain(completion_text)],
</code_context>
<issue_to_address>
**issue (bug_risk):** Stripping `completion_text` reintroduces the inter-chunk spacing bug this code was originally avoiding.

Using `completion_text.strip()` here can reintroduce the original streaming bug: if one chunk ends with `"hello "` and the next starts with `"world"`, you send `"hello"` then `"world"`, and the client’s concatenation loses the space. This contradicts the earlier decision not to strip streaming chunks. Please avoid `strip()` here, or restrict cleanup to artifacts around `<think>` tags without changing user-visible leading/trailing spaces.
</issue_to_address>

### Comment 2
<location path="astrbot/core/provider/sources/openai_source.py" line_range="539-545" />
<code_context>

         state = ChatCompletionStreamState()
+        
+        # Track partial thinking tags across chunks for MiniMax-style reasoning
+        thinking_buffer = ""
+        in_thinking_block = False

         async for chunk in stream:
</code_context>
<issue_to_address>
**suggestion:** `in_thinking_block` is currently unused and can be removed or wired into the logic.

This flag is written but never read in the streaming loop. If block state tracking is no longer needed, remove it; if it is, integrate it into the buffering/parse logic so it meaningfully complements `thinking_buffer` and `rfind` rather than remaining unused.

```suggestion
        state = ChatCompletionStreamState()

        # Track partial thinking tags across chunks for MiniMax-style reasoning
        thinking_buffer = ""

        async for chunk in stream:
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}