Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The newly introduced
in_thinking_blockvariable is never used and should either be wired into the logic or removed to avoid confusion about the stream parsing state. - The
thinking_pattern = re.compile(...)is created on every streamed chunk; consider moving this to a module- or class-level constant to avoid repeated compilation in long-running streams.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The newly introduced `in_thinking_block` variable is never used and should either be wired into the logic or removed to avoid confusion about the stream parsing state.
- The `thinking_pattern = re.compile(...)` is created on every streamed chunk; consider moving this to a module- or class-level constant to avoid repeated compilation in long-running streams.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="609-613" />
<code_context>
+ # We closed a thinking block, clear any buffered content
+ thinking_buffer = ""
+
+ # Strip whitespace but preserve structure
+ completion_text = completion_text.strip()
+
+ # Only yield if there's actual text content remaining
+ if completion_text:
+ llm_response.result_chain = MessageChain(
+ chain=[Comp.Plain(completion_text)],
</code_context>
<issue_to_address>
**issue (bug_risk):** Stripping `completion_text` reintroduces the inter-chunk spacing bug this code was originally avoiding.
Using `completion_text.strip()` here can reintroduce the original streaming bug: if one chunk ends with `"hello "` and the next starts with `"world"`, you send `"hello"` then `"world"`, and the client’s concatenation loses the space. This contradicts the earlier decision not to strip streaming chunks. Please avoid `strip()` here, or restrict cleanup to artifacts around `<think>` tags without changing user-visible leading/trailing spaces.
</issue_to_address>
### Comment 2
<location path="astrbot/core/provider/sources/openai_source.py" line_range="539-545" />
<code_context>
state = ChatCompletionStreamState()
+
+ # Track partial thinking tags across chunks for MiniMax-style reasoning
+ thinking_buffer = ""
+ in_thinking_block = False
async for chunk in stream:
</code_context>
<issue_to_address>
**suggestion:** `in_thinking_block` is currently unused and can be removed or wired into the logic.
This flag is written but never read in the streaming loop. If block state tracking is no longer needed, remove it; if it is, integrate it into the buffering/parse logic so it meaningfully complements `thinking_buffer` and `rfind` rather than remaining unused.
```suggestion
state = ChatCompletionStreamState()
# Track partial thinking tags across chunks for MiniMax-style reasoning
thinking_buffer = ""
async for chunk in stream:
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request implements logic to extract and handle reasoning content within tags for streaming responses, specifically supporting MiniMax-style reasoning by buffering partial tags across chunks. Feedback includes removing an unused variable, optimizing regex compilation by moving it outside the loop, and resetting the response chain per iteration to prevent data leakage. Additionally, the yield flag should be set when reasoning content is extracted to ensure chunks are processed, and a strip() call should be removed to preserve essential whitespace between streaming chunks.
| # Track partial thinking tags across chunks for MiniMax-style reasoning | ||
| thinking_buffer = "" | ||
| in_thinking_block = False | ||
|
|
||
| async for chunk in stream: |
There was a problem hiding this comment.
The variable in_thinking_block is initialized but never used in the subsequent logic. Additionally, for better performance, the regex pattern should be compiled once outside the streaming loop. Also, since the llm_response object is reused across all chunks in the stream, we should clear the result_chain at the start of each iteration to prevent content from previous chunks from leaking into the current one (which can happen if a chunk contains only reasoning content or metadata).
# Track partial thinking tags across chunks for MiniMax-style reasoning
thinking_buffer = ""
thinking_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
async for chunk in stream:
llm_response.result_chain = None| for match in thinking_pattern.finditer(completion_text): | ||
| think_content = match.group(1).strip() | ||
| if think_content: | ||
| if llm_response.reasoning_content: | ||
| llm_response.reasoning_content += "\n" + think_content | ||
| else: | ||
| llm_response.reasoning_content = think_content |
There was a problem hiding this comment.
When reasoning content is successfully extracted from the message body, the _y flag must be set to True. This ensures that the chunk is yielded to the consumer even if the remaining completion_text is empty (e.g., when a chunk contains only the end of a thinking block). Without this, the reasoning content might be delayed or lost.
| for match in thinking_pattern.finditer(completion_text): | |
| think_content = match.group(1).strip() | |
| if think_content: | |
| if llm_response.reasoning_content: | |
| llm_response.reasoning_content += "\n" + think_content | |
| else: | |
| llm_response.reasoning_content = think_content | |
| # Extract complete thinking blocks | |
| for match in thinking_pattern.finditer(completion_text): | |
| think_content = match.group(1).strip() | |
| if think_content: | |
| if llm_response.reasoning_content: | |
| llm_response.reasoning_content += "\n" + think_content | |
| else: | |
| llm_response.reasoning_content = think_content | |
| _y = True |
| thinking_buffer = "" | ||
|
|
||
| # Strip whitespace but preserve structure | ||
| completion_text = completion_text.strip() |
There was a problem hiding this comment.
Calling strip() on every chunk will remove leading and trailing spaces that are essential for correctly joining words split across chunk boundaries. This negates the strip=False setting used in _normalize_content and will cause words to be merged incorrectly (e.g., "Hello " and "world" becoming "Helloworld").
| thinking_buffer = "" | ||
|
|
||
| # Find all thinking blocks in this chunk | ||
| thinking_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL) |
Soulter
left a comment
There was a problem hiding this comment.
place the changes into a helper function would be better
Move inline thinking block extraction logic from streaming loop into a separate _extract_thinking_blocks helper method for better code organization and maintainability.
|
已修改,提取思考块的部分使用helper function |
- Handle usage in chunks with empty choices (MiniMax sends usage in final chunk with choices=[]) - Yield the usage chunk so caller can capture it - Accumulate token usage from chunk responses in tool_loop_agent_runner This fixes token usage display not showing for MiniMax-M2.7 in webui.
|
修复了minimax传递token丢失的问题 |
…sages When MiniMax sends a usage-only chunk (choices=[]) after content chunks, the old code yielded the shared llm_response object which still contained the previous result_chain, causing duplicate messages on the frontend. Now creates a separate LLMResponse for usage-only chunks to avoid carrying over stale content.
问题描述
使用 MiniMax(基于 OpenAI URL 格式的 API)在开启流式响应时,模型的思考内容会被当作正文一起输出给用户。关闭流式响应则正常只输出正文。


根本原因
MiniMax 模型思考内容的发送格式
MiniMax 模型在流式响应时,会将思考内容以
<think>...</think>标签的形式嵌入在delta.content中。例如:思考内容跨越多个 chunk 传输时,
<think>和</think>标签会被分割到不同的 chunk 中。3. 流式响应的缺陷
在
_query_stream方法(流式路径)中,原有代码直接发送delta.content,没有处理思考标签:当思考标签被分割到不同 chunk 时:
<think>\n用户→ 无法匹配完整正则 → 直接发送地回复...</think>\n\n晚上好→ 无法匹配完整正则 → 直接发送结果:
<think>\n用户地回复...</think>\n\n晚上好被完整发送给用户。Modifications / 改动点
核心思路
在流式处理中维护一个
thinking_buffer,追踪跨 chunk 的不完整思考标签。修改文件
astrbot/core/provider/sources/openai_source.py修改内容
_query_stream方法开始处):处理流程示例
影响范围
受影响的 Provider
继承自
ProviderOpenAIOfficial的所有 Provider(使用 OpenAI 兼容格式):向后兼容性
如果 Provider 不发送
<think>...</think>格式的思考内容,此修改不会产生任何影响如果 Provider 在非流式模式下已有思考标签处理逻辑,流式模式下的行为现在保持一致
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
关联问题
#7013
#6647
#6745
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Handle MiniMax-style reasoning tags in OpenAI-compatible streaming responses to prevent internal thinking content from being sent to end users while still capturing it as structured reasoning metadata.
New Features:
<think>...</think>reasoning segments from streaming responses intoreasoning_contentfor OpenAI-compatible providers using MiniMax-style thinking tags.Bug Fixes:
<think>...</think>from being emitted to users in streaming responses, even when tags are split across chunks.Enhancements:
_query_streamto align streaming behavior with non-streaming responses.