Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
8682f8d to
56ff5d0
Compare
1d5e6ac to
e559e5e
Compare
56ff5d0 to
11d859c
Compare
11d859c to
855e3a7
Compare
| } | ||
| // Clear after first use to avoid duplicating across | ||
| // multiple tool calls in the same message. | ||
| thoughtRecords = nil |
There was a problem hiding this comment.
Not currently an issue since we disable parallel tool calls, but once we do this will be pertinent.
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
d09d5b0 to
a0ad392
Compare
| InvocationError error | ||
| Metadata Metadata | ||
| CreatedAt time.Time | ||
| ModelThoughts []*ModelThoughtRecord |
There was a problem hiding this comment.
How much space could ModelThoughts take?
Maybe we should consider configuring recording things not only due to compliance reasons but also to limit space usage.
| accumulateUsage(&cumulativeUsage, resp.Usage) | ||
|
|
||
| // Capture any thinking blocks that were returned. | ||
| var thoughtRecords []*recorder.ModelThoughtRecord |
There was a problem hiding this comment.
Code extracting thinking blocks looks exactly the same in streaming implementation.
I think it is long enough so it would make sense to extract to common base function.
|
|
||
| // Clear after first use to avoid duplicating across | ||
| // multiple tool calls in the same message. | ||
| thoughtRecords = nil |
There was a problem hiding this comment.
Maybe it is a bit out of scope but this is a bit confusing to me and it may be a good time for a small cleanup.
If I understand correctly clearing thoughtRecords should not matter right now as parallel tool calls are disabled but it is added just in case they are enabled in the future? Or it is just to make sure that same thoughts are not stored in 2 tool calls? In later case I think it would be better to have duplicated information in 2 calls then 1 call with none.
At the same time construction of thoughtRecords is not prepared for parallel case, it just concatenates all thinking blocks from Content. How about constructing thoughtRecords slice in this for loop and clearing it on each RecordToolUsage call?
I see later thoughtRecords are used in range pendingToolCalls loop but I think those calls should also be processed in this loop. Then it should be possible to aggregate thinking blocks per call properly (assuming thinking blocks are properly ordered with tool calls dividing them) and would make code a bit simpler. Maybe tool call processing could even be extracted to base struct.
There was a problem hiding this comment.
I think this is related to @pawbana comment, but just to clarify: IIUC we're storing all thinking records on the first tool call, and subsequent calls get none. Not sure how we're planning to present this in the UI, but all thinking would be associated with a single tool call, which might not be accurate.
Is the purpose of this mapping between tool calls and thinking to "understand why the model chose to use this tool"? If so, I like @pawbana suggestion of aggregating thinking blocks per call. Otherwise, I'm not sure this mapping of tool calls to thinking is the right approach 🤔
|
|
||
| // extractModelThoughts extracts reasoning summary items from response output | ||
| // and converts them to ModelThoughtRecords for association with tool usage. | ||
| func (i *responsesInterceptionBase) extractModelThoughts(response *responses.Response) []*recorder.ModelThoughtRecord { |
There was a problem hiding this comment.
I think it would be better to merge this logic into getPendingInjectedToolCalls so it returns function call + thinking blocks related to that call.
| Content: variant.Thinking, | ||
| CreatedAt: time.Now(), | ||
| }) | ||
| case anthropic.RedactedThinkingBlock: |
There was a problem hiding this comment.
This case is no-op, could be just a comment about RedactedThinkingBlock.
|
Maybe I'm missing something but is there a reason thinking blocks are merged into |
|
|
||
| event: content_block_start | ||
| data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""} } | ||
| data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":""}} |
There was a problem hiding this comment.
nit: I would suggest keeping this fixture as it was, a simple fixture without any thinking blocks, and introducing this change as a new fixture. If a test using simple breaks, we'd want to know it's a basic request/response issue, not have to first rule out whether thinking blocks are involved.
| OaiResponsesStreamingBuiltinTool []byte | ||
|
|
||
| //go:embed openai/responses/streaming/multi_reasoning_builtin_tool.txtar | ||
| OaiResponsesStreamingMultiReasoningBuiltinTool []byte |
There was a problem hiding this comment.
Shouldn't we also add some fixtures for model thoughts for Chat completions? Or doesn't Chat completion support reasoning? 🤔
|
|
||
| // Clear after first use to avoid duplicating across | ||
| // multiple tool calls in the same message. | ||
| thoughtRecords = nil |
There was a problem hiding this comment.
I think this is related to @pawbana comment, but just to clarify: IIUC we're storing all thinking records on the first tool call, and subsequent calls get none. Not sure how we're planning to present this in the UI, but all thinking would be associated with a single tool call, which might not be accurate.
Is the purpose of this mapping between tool calls and thinking to "understand why the model chose to use this tool"? If so, I like @pawbana suggestion of aggregating thinking blocks per call. Otherwise, I'm not sure this mapping of tool calls to thinking is the right approach 🤔
| case anthropic.ThinkingBlock: | ||
| thoughtRecords = append(thoughtRecords, &recorder.ModelThoughtRecord{ | ||
| Content: variant.Thinking, | ||
| CreatedAt: time.Now(), |
There was a problem hiding this comment.
AFAIK, with streaming, the code waits until we get a stop block and processes all thinking blocks at that point, meaning they'll all have the same CreatedAt. I assume the ordering is still preserved by their position in the slice, so this probably doesn't matter, but worth noting that CreatedAt won't reflect when each block actually arrived. Could this be an issue?

feat: record model thoughts
Signed-off-by: Danny Kopping danny@coder.com
fix: send model thoughts with tool usage recording
Signed-off-by: Danny Kopping danny@coder.com