-
Notifications
You must be signed in to change notification settings - Fork 3
Add extended-thinking support; default reviewer to adaptive thinking #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
15df9ee
Add extended-thinking support; default reviewer to adaptive thinking
mattgodbolt-molty 7739322
Document Anthropic API gotchas in CLAUDE.md
mattgodbolt-molty acdb10d
Address Copilot review on PR #17
mattgodbolt-molty 01d80c8
Address local code review on PR #17
mattgodbolt-molty cbb0f71
Ignore Claude Code's local scheduled_tasks.lock
mattgodbolt-molty 49df42b
Address Copilot follow-up review on PR #17
mattgodbolt-molty df76c7a
Revert cache key conditional — unconditional thinking is simpler
mattgodbolt-molty File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -57,6 +57,7 @@ def mock_anthropic_client(): | |
| mock_client = MagicMock() | ||
| mock_message = MagicMock() | ||
| mock_content = MagicMock() | ||
| mock_content.type = "text" | ||
| mock_content.text = "This assembly code implements a simple square function..." | ||
| mock_message.content = [mock_content] | ||
|
|
||
|
Comment on lines
59
to
63
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added in acdb10d: |
||
|
|
@@ -149,6 +150,93 @@ async def test_process_request_success(self, sample_request, mock_anthropic_clie | |
| assert structured_data["compiler"] == "g++" | ||
| assert structured_data["sourceCode"] == "int square(int x) {\n return x * x;\n}" | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_picks_last_text_block_with_thinking(self, sample_request, noop_metrics): | ||
| """When thinking is enabled, the response has thinking blocks before the | ||
| final text block. The handler should select the text block, not the | ||
| thinking block.""" | ||
| thinking_block = MagicMock() | ||
| thinking_block.type = "thinking" | ||
| thinking_block.thinking = "Let me trace through the imul..." | ||
|
|
||
| text_block = MagicMock() | ||
| text_block.type = "text" | ||
| text_block.text = "Final explanation here." | ||
|
|
||
| mock_message = MagicMock() | ||
| mock_message.content = [thinking_block, text_block] | ||
| mock_message.usage = MagicMock(input_tokens=100, output_tokens=50) | ||
| mock_message.stop_reason = "end_turn" | ||
|
|
||
| mock_client = MagicMock() | ||
| mock_client.messages.create = AsyncMock(return_value=mock_message) | ||
|
|
||
| test_prompt = Prompt(Path("app/prompt.yaml")) | ||
| response = await process_request(sample_request, mock_client, test_prompt, noop_metrics) | ||
|
|
||
| assert response.status == "success" | ||
| assert response.explanation == "Final explanation here." | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_returns_error_when_no_text_block(self, sample_request, noop_metrics): | ||
| """A response with no text block (e.g. thinking exhausted max_tokens) | ||
| should return status='error' with token usage populated, not raise. | ||
| Production must surface a structured error rather than a generic 500.""" | ||
| thinking_block = MagicMock() | ||
| thinking_block.type = "thinking" | ||
| thinking_block.thinking = "..." | ||
|
|
||
| mock_message = MagicMock() | ||
| mock_message.content = [thinking_block] | ||
| mock_message.usage = MagicMock(input_tokens=100, output_tokens=50) | ||
| mock_message.stop_reason = "max_tokens" | ||
|
|
||
| mock_client = MagicMock() | ||
| mock_client.messages.create = AsyncMock(return_value=mock_message) | ||
|
|
||
| test_prompt = Prompt(Path("app/prompt.yaml")) | ||
| response = await process_request(sample_request, mock_client, test_prompt, noop_metrics) | ||
|
|
||
| assert response.status == "error" | ||
| assert response.explanation is None | ||
| assert "no text content" in response.message | ||
| # Real tokens were spent — surface them so callers can see the cost. | ||
| assert response.usage is not None | ||
| assert response.usage.inputTokens == 100 | ||
| assert response.usage.outputTokens == 50 | ||
|
|
||
|
|
||
| class TestPromptValidation: | ||
| """Validation rules enforced at Prompt construction.""" | ||
|
|
||
| def test_thinking_requires_min_max_tokens(self): | ||
| """A prompt YAML that enables thinking must also bump max_tokens.""" | ||
| with pytest.raises(ValueError, match="too low for thinking"): | ||
| Prompt( | ||
| { | ||
| "model": {"name": "test", "max_tokens": 1536, "thinking": {"type": "adaptive"}}, | ||
| "system_prompt": "", | ||
| "user_prompt": "", | ||
| "assistant_prefill": "", | ||
| "audience_levels": {}, | ||
| "explanation_types": {}, | ||
| } | ||
| ) | ||
|
|
||
| def test_thinking_with_sufficient_max_tokens_is_ok(self): | ||
| """At/above the floor (4096), thinking-enabled prompts load fine.""" | ||
| prompt = Prompt( | ||
| { | ||
| "model": {"name": "test", "max_tokens": 4096, "thinking": {"type": "adaptive"}}, | ||
| "system_prompt": "", | ||
| "user_prompt": "", | ||
| "assistant_prefill": "", | ||
| "audience_levels": {}, | ||
| "explanation_types": {}, | ||
| } | ||
| ) | ||
| assert prompt.thinking == {"type": "adaptive"} | ||
|
|
||
|
|
||
| class TestSelectImportantAssembly: | ||
| """Test the select_important_assembly function.""" | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 01d80c8 (different shape than your suggestion): instead of raising and converting to a 500,
_call_anthropic_apinow returnsExplainResponse(status="error")with usage populated, emitsClaudeExplainEmptyResponse, andprocess_requestskips caching error responses. Token usage is included in the error message so it's diagnosable from logs.