feat: record model thoughts by dannykopping · Pull Request #203 · coder/aibridge

dannykopping · 2026-03-05T15:37:10Z

feat: record model thoughts

Signed-off-by: Danny Kopping danny@coder.com

fix: send model thoughts with tool usage recording

Signed-off-by: Danny Kopping danny@coder.com

dannykopping · 2026-03-05T15:37:28Z

feat: record model thoughts #203 👈 (View in Graphite)
feat: add client session tracking #198
feat: record and correlate tool call IDs across interceptions #188
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

dannykopping · 2026-03-06T13:47:58Z

intercept/responses/base.go

 		}
+		// Clear after first use to avoid duplicating across
+		// multiple tool calls in the same message.
+		thoughtRecords = nil


Not currently an issue since we disable parallel tool calls, but once we do this will be pertinent.

Signed-off-by: Danny Kopping <danny@coder.com>

pawbana · 2026-03-06T15:06:30Z

recorder/types.go

 	InvocationError error
 	Metadata        Metadata
 	CreatedAt       time.Time
+	ModelThoughts   []*ModelThoughtRecord


How much space could ModelThoughts take?
Maybe we should consider configuring recording things not only due to compliance reasons but also to limit space usage.

pawbana · 2026-03-09T11:26:53Z

intercept/messages/blocking.go

 		accumulateUsage(&cumulativeUsage, resp.Usage)

+		// Capture any thinking blocks that were returned.
+		var thoughtRecords []*recorder.ModelThoughtRecord


Code extracting thinking blocks looks exactly the same in streaming implementation.
I think it is long enough so it would make sense to extract to common base function.

pawbana · 2026-03-09T11:44:06Z

intercept/messages/blocking.go

-
+			// Clear after first use to avoid duplicating across
+			// multiple tool calls in the same message.
+			thoughtRecords = nil


Maybe it is a bit out of scope but this is a bit confusing to me and it may be a good time for a small cleanup.

If I understand correctly clearing thoughtRecords should not matter right now as parallel tool calls are disabled but it is added just in case they are enabled in the future? Or it is just to make sure that same thoughts are not stored in 2 tool calls? In later case I think it would be better to have duplicated information in 2 calls then 1 call with none.

At the same time construction of thoughtRecords is not prepared for parallel case, it just concatenates all thinking blocks from Content. How about constructing thoughtRecords slice in this for loop and clearing it on each RecordToolUsage call?

I see later thoughtRecords are used in range pendingToolCalls loop but I think those calls should also be processed in this loop. Then it should be possible to aggregate thinking blocks per call properly (assuming thinking blocks are properly ordered with tool calls dividing them) and would make code a bit simpler. Maybe tool call processing could even be extracted to base struct.

I think this is related to @pawbana comment, but just to clarify: IIUC we're storing all thinking records on the first tool call, and subsequent calls get none. Not sure how we're planning to present this in the UI, but all thinking would be associated with a single tool call, which might not be accurate.

Is the purpose of this mapping between tool calls and thinking to "understand why the model chose to use this tool"? If so, I like @pawbana suggestion of aggregating thinking blocks per call. Otherwise, I'm not sure this mapping of tool calls to thinking is the right approach 🤔

pawbana · 2026-03-09T12:08:16Z

intercept/responses/base.go


+// extractModelThoughts extracts reasoning summary items from response output
+// and converts them to ModelThoughtRecords for association with tool usage.
+func (i *responsesInterceptionBase) extractModelThoughts(response *responses.Response) []*recorder.ModelThoughtRecord {


I think it would be better to merge this logic into getPendingInjectedToolCalls so it returns function call + thinking blocks related to that call.

pawbana · 2026-03-09T12:10:50Z

intercept/messages/blocking.go

+					Content:   variant.Thinking,
+					CreatedAt: time.Now(),
+				})
+			case anthropic.RedactedThinkingBlock:


This case is no-op, could be just a comment about RedactedThinkingBlock.

pawbana · 2026-03-09T12:26:39Z

Maybe I'm missing something but is there a reason thinking blocks are merged into RecordToolUsage and recorded only on tool call? (inner loop abstraction?)
I understand they are usually connected but I'd think there could be some reasoning without any tool call or provider could call tool directly.
Maybe thinking blocks should be part of RecordInterceptionEnded?

ssncferreira · 2026-03-09T18:12:27Z

fixtures/anthropic/simple.txtar


 event: content_block_start
-data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}  }
+data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":""}}


nit: I would suggest keeping this fixture as it was, a simple fixture without any thinking blocks, and introducing this change as a new fixture. If a test using simple breaks, we'd want to know it's a basic request/response issue, not have to first rule out whether thinking blocks are involved.

ssncferreira · 2026-03-09T18:27:51Z

fixtures/fixtures.go

 	OaiResponsesStreamingBuiltinTool []byte

+	//go:embed openai/responses/streaming/multi_reasoning_builtin_tool.txtar
+	OaiResponsesStreamingMultiReasoningBuiltinTool []byte


Shouldn't we also add some fixtures for model thoughts for Chat completions? Or doesn't Chat completion support reasoning? 🤔

ssncferreira · 2026-03-09T18:43:40Z

intercept/messages/blocking.go

-
+			// Clear after first use to avoid duplicating across
+			// multiple tool calls in the same message.
+			thoughtRecords = nil


I think this is related to @pawbana comment, but just to clarify: IIUC we're storing all thinking records on the first tool call, and subsequent calls get none. Not sure how we're planning to present this in the UI, but all thinking would be associated with a single tool call, which might not be accurate.

Is the purpose of this mapping between tool calls and thinking to "understand why the model chose to use this tool"? If so, I like @pawbana suggestion of aggregating thinking blocks per call. Otherwise, I'm not sure this mapping of tool calls to thinking is the right approach 🤔

ssncferreira · 2026-03-09T18:55:19Z

intercept/messages/streaming.go

+					case anthropic.ThinkingBlock:
+						thoughtRecords = append(thoughtRecords, &recorder.ModelThoughtRecord{
+							Content:   variant.Thinking,
+							CreatedAt: time.Now(),


AFAIK, with streaming, the code waits until we get a stop block and processes all thinking blocks at that point, meaning they'll all have the same CreatedAt. I assume the ordering is still preserved by their position in the slice, so this probably doesn't matter, but worth noting that CreatedAt won't reflect when each block actually arrived. Could this be an issue?

dannykopping mentioned this pull request Mar 5, 2026

feat: record and correlate tool call IDs across interceptions #188

Merged

dannykopping mentioned this pull request Mar 5, 2026

feat: add client session tracking #198

Merged

dannykopping force-pushed the dk/model-thoughts branch from 8682f8d to 56ff5d0 Compare March 6, 2026 07:28

dannykopping changed the base branch from dk/session-id-tracking to graphite-base/203 March 6, 2026 11:52

dannykopping force-pushed the graphite-base/203 branch from 1d5e6ac to e559e5e Compare March 6, 2026 11:52

dannykopping force-pushed the dk/model-thoughts branch from 56ff5d0 to 11d859c Compare March 6, 2026 11:52

graphite-app bot changed the base branch from graphite-base/203 to main March 6, 2026 11:53

dannykopping force-pushed the dk/model-thoughts branch from 11d859c to 855e3a7 Compare March 6, 2026 11:53

dannykopping commented Mar 6, 2026

View reviewed changes

dannykopping marked this pull request as ready for review March 6, 2026 14:30

dannykopping requested review from pawbana and ssncferreira March 6, 2026 14:30

dannykopping added 6 commits March 6, 2026 16:51

feat: record model thoughts

d1fccb2

Signed-off-by: Danny Kopping <danny@coder.com>

fix: send model thoughts with tool usage recording

c8cb04f

Signed-off-by: Danny Kopping <danny@coder.com>

feat: capture responses reasoning

4068b9c

Signed-off-by: Danny Kopping <danny@coder.com>

chore: refactor tests

3f48417

Signed-off-by: Danny Kopping <danny@coder.com>

chore: test multiple thoughts

3b5ee80

Signed-off-by: Danny Kopping <danny@coder.com>

chore: refactor tests

a0ad392

Signed-off-by: Danny Kopping <danny@coder.com>

dannykopping force-pushed the dk/model-thoughts branch from d09d5b0 to a0ad392 Compare March 6, 2026 15:14

pawbana reviewed Mar 9, 2026

View reviewed changes

ssncferreira reviewed Mar 9, 2026

View reviewed changes

Conversation

dannykopping commented Mar 5, 2026

Uh oh!

dannykopping commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pawbana commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dannykopping commented Mar 5, 2026 •

edited

Loading

pawbana commented Mar 9, 2026 •

edited

Loading