Python: fix(python/google): filter thinking text parts from chat completion responses#13711
Python: fix(python/google): filter thinking text parts from chat completion responses#13711dariusFTOS wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Gemini models with thinking enabled return text parts with part.thought=True. These thinking/reasoning parts were being included in ChatMessageContent alongside the actual response, causing thinking text to leak into responses. This adds a check to skip parts where part.thought is True in both _create_chat_message_content and _create_streaming_chat_message_content. Fixes microsoft#13710
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 89%
✓ Correctness
The diff adds filtering for Google AI 'thought' parts in both non-streaming and streaming chat completion paths. When the Gemini model returns parts with
thought=True(internal chain-of-thought reasoning), these are now skipped before being converted toTextContentor other content types. ThePart.thoughtattribute is present in google-genai ~1.51.0 (the pined SDK version) and defaults toNonefor non-thought parts, so the truthiness check is safe. The guard is correctly placed before theif part.text:check, since thought parts carry text that should not be surfaced. The Vertex AI connector does not have the same filtering, but it uses a different SDK (vertexai) and may have different behavior for thought parts — this is outside the scope of this PR.
✓ Security Reliability
The diff adds filtering of 'thought' parts (thinking/reasoning tokens) from Google AI chat completion responses, both in streaming and non-streaming paths. The primary reliability concern is that
part.thoughtis accessed via direct attribute access, which is inconsistent with the defensivegetattr(part, "thought_signature", None)pattern already used in the same functions for SDK compatibility. If the google-genai SDK version constraint is ever relaxed or thethoughtattribute is removed/renamed, direct access would raise an unhandledAttributeError, crashing response parsing. The existing test for SDK attribute guards (test_create_chat_message_content_getattr_guard_on_missing_attribute) doesn't cover this new attribute since MagicMock auto-creates attributes on access.
✗ Test Coverage
The PR adds logic to skip 'thought' parts in both
_create_chat_message_contentand_create_streaming_chat_message_content, but no tests cover this new behavior. The existing test suite only uses parts withthought=None/False(viaPart.from_text()andPart.from_function_call()), so the newif part.thought: continuebranches have zero test coverage. Tests should verify that thought-only parts are filtered out, that mixed thought/non-thought responses retain only the non-thought items, and that the streaming path behaves identically.
✗ Design Approach
The PR silently discards
part.thought(reasoning/thinking) content from Google AI responses by skipping those parts entirely. This is a symptom-level fix that treats thought content as noise to be suppressed, when Semantic Kernel already has a purpose-builtReasoningContent(andStreamingReasoningContent) type that is part ofCMC_ITEM_TYPESandSTREAMING_ITEM_TYPES. In Gemini's thinking API,part.thought == Truewithpart.textcarrying the actual reasoning text — the correct design is to surface these asReasoningContentitems rather than drop them. The Vertex AI connector has the same gap and would need the same fix. Silent discard also breaks any caller that wants to inspect model reasoning or pass it back in multi-turn conversations.
Flagged Issues
- Thought parts are silently discarded instead of being surfaced as
ReasoningContent/StreamingReasoningContent, which already exist in SK and are part ofCMC_ITEM_TYPES. For Gemini thinking models,part.thought == Trueandpart.textholds the reasoning text — the fix should wrap these asReasoningContent(and the streaming equivalent) rather than dropping them. Silent discard breaks calers that need to inspect model reasoning for display, logging, or multi-turn context. - No tests cover the new
part.thoughtfiltering logic in either_create_chat_message_contentor_create_streaming_chat_message_content. Add tests that verify: (1) a Part withthought=Trueand text produces the correct content type, (2) a response mixing thought and non-thought parts returns both appropriately, and (3) the streaming path mirrors the same behavior.
Suggestions
- Use
getattr(part, "thought", False)instead of directpart.thoughtaccess for consistency with the existinggetattr(part, "thought_signature", None)pattern, providing resilience against SDK version mismatches where thethoughtattribute may not exist onPart. - Apply the same thought-part handling to the Vertex AI chat completion connector (
vertex_ai_chat_completion.py), which has a parallel code structure and the same gap. - Add a SDK guard test simulating a
Partwithout thethoughtattribute, similar to the existingtest_create_chat_message_content_getattr_guard_on_missing_attribute.
Automated review by dariusFTOS's agents
python/semantic_kernel/connectors/ai/google/google_ai/services/google_ai_chat_completion.py
Outdated
Show resolved
Hide resolved
python/semantic_kernel/connectors/ai/google/google_ai/services/google_ai_chat_completion.py
Outdated
Show resolved
Hide resolved
- Use ReasoningContent for non-streaming and StreamingReasoningContent for streaming thought parts (part.thought == True) - Use getattr(part, "thought", False) for SDK compatibility - Thought parts are now properly typed rather than silently dropped
@microsoft-github-policy-service agree company="FintechOS" |
Motivation and Context
Fixes #13710
When using Gemini 3 Pro (preview) with thinking enabled, the API returns text parts with
part.thought = Truecontaining the model's internal reasoning. These thinking parts are incorrectly included inChatMessageContent.itemsalongside the actual response text, causing the model's chain-of-thought to leak into application-visible responses.This breaks downstream processing (e.g. JSON parsing of structured agent responses) because the response contains thinking text instead of the actual answer. The fix in #13609 correctly handled
thought_signatureon function call parts, but did not filter thinking text parts from the response content.Description
Response parsing (filter thinking text parts):
google_ai_chat_completion.py: In_create_chat_message_content(), skip parts wherepart.thought is Truebefore adding them asTextContentgoogle_ai_chat_completion.py: Same filter in_create_streaming_chat_message_content()for the streaming pathBackward compatible: When
part.thoughtisNoneorFalse(thinking disabled or older models), behavior is identical to before. The rawGenerateContentResponseis still available viainner_contentfor consumers who need access to thinking parts.Test Coverage
TODO: Add tests for:
test_create_chat_message_content_filters_thought_parts— verifies thinking parts are excluded from response itemstest_create_chat_message_content_without_thought_parts— verifies backward compatibility when no thinking parts presenttest_create_streaming_chat_message_content_filters_thought_parts— same for streaming pathContribution Checklist