Skip to content

Commit 241a211

Browse files
committed
Code reviewer best of n!
1 parent cc77753 commit 241a211

File tree

5 files changed

+507
-1
lines changed

5 files changed

+507
-1
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
import { createBase2 } from './base2'
2+
3+
const definition = {
4+
...createBase2('default', { hasCodeReviewerBestOfN: true }),
5+
id: 'base2-with-code-reviewer-best-of-n',
6+
displayName: 'Buffy the Code Reviewing Best-of-N Orchestrator',
7+
}
8+
export default definition

.agents/base2/base2.ts

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,14 @@ export function createBase2(
1212
hasNoValidation?: boolean
1313
planOnly?: boolean
1414
hasCodeReviewer?: boolean
15+
hasCodeReviewerBestOfN?: boolean
1516
},
1617
): Omit<SecretAgentDefinition, 'id'> {
1718
const {
1819
hasNoValidation = false,
1920
planOnly = false,
2021
hasCodeReviewer = false,
22+
hasCodeReviewerBestOfN = false,
2123
} = options ?? {}
2224
const isDefault = mode === 'default'
2325
const isFast = mode === 'fast'
@@ -80,6 +82,7 @@ export function createBase2(
8082
isDefault && 'thinker-best-of-n',
8183
isGpt5 && 'thinker-best-of-n-gpt-5',
8284
hasCodeReviewer && 'code-reviewer',
85+
hasCodeReviewerBestOfN && 'code-reviewer-best-of-n',
8386
'context-pruner',
8487
),
8588

@@ -133,6 +136,8 @@ Use the spawn_agents tool to spawn specialized agents to help you complete the u
133136
'- Spawn commanders sequentially if the second command depends on the the first.',
134137
hasCodeReviewer &&
135138
'- Spawn a code-reviewer agent to review the code changes after you have made them.',
139+
hasCodeReviewerBestOfN &&
140+
'- Spawn a code-reviewer-best-of-n agent to review the code changes after you have made them.',
136141
).join('\n ')}
137142
- **No need to include context:** When prompting an agent, realize that many agents can already see the entire conversation history, so you can be brief in prompting them without needing to include context.
138143
@@ -179,6 +184,7 @@ ${PLACEHOLDER.GIT_CHANGES_PROMPT}
179184
isMax,
180185
hasNoValidation,
181186
hasCodeReviewer,
187+
hasCodeReviewerBestOfN,
182188
}),
183189
stepPrompt: planOnly
184190
? buildPlanOnlyStepPrompt({})
@@ -220,6 +226,7 @@ function buildImplementationInstructionsPrompt({
220226
isMax,
221227
hasNoValidation,
222228
hasCodeReviewer,
229+
hasCodeReviewerBestOfN,
223230
}: {
224231
isSonnet: boolean
225232
isGpt5: boolean
@@ -228,6 +235,7 @@ function buildImplementationInstructionsPrompt({
228235
isMax: boolean
229236
hasNoValidation: boolean
230237
hasCodeReviewer: boolean
238+
hasCodeReviewerBestOfN: boolean
231239
}) {
232240
return `Act as a helpful assistant and freely respond to the user's request however would be most helpful to the user. Use your judgement to orchestrate the completion of the user's request using your specialized sub-agents and tools as needed. Take your time and be comprehensive.
233241
@@ -238,11 +246,13 @@ The user asks you to implement a new feature. You respond in multiple steps:
238246
${buildArray(
239247
EXPLORE_PROMPT,
240248
`- Important: Read as many files as could possibly be relevant to the task over several steps to improve your understanding of the user's request and produce the best possible code changes. Find more examples within the codebase similar to the user's request, dependencies that help with understanding how things work, tests, etc. This is frequently 12-20 files, depending on the task.`,
241-
`- For any task requiring 3+ steps, use the write_todos tool to write out your step-by-step implementation plan. Include ALL of the applicable tasks in the list.${hasCodeReviewer ? ' Include a step to review the code changes with the code-reviewer agent after you have made them.' : ''}${hasNoValidation ? '' : ' You should include at least one step to validate/test your changes: be specific about whether to typecheck, run tests, run lints, etc.'} Skip write_todos for simple tasks like quick edits or answering questions.`,
249+
`- For any task requiring 3+ steps, use the write_todos tool to write out your step-by-step implementation plan. Include ALL of the applicable tasks in the list.${hasCodeReviewer ? ' Include a step to review the code changes with the code-reviewer agent after you have made them.' : ''}${hasCodeReviewerBestOfN ? ' Include a step to review the code changes with the code-reviewer-best-of-n agent after you have made them.' : ''}${hasNoValidation ? '' : ' You should include at least one step to validate/test your changes: be specific about whether to typecheck, run tests, run lints, etc.'} Skip write_todos for simple tasks like quick edits or answering questions.`,
242250
!isFast &&
243251
`- You must spawn the ${isGpt5 ? 'editor-best-of-n-gpt-5' : 'editor-best-of-n'} agent to implement non-trivial code changes, since it will generate the best code changes from multiple implementation proposals. This is the best way to make high quality code changes -- strongly prefer using this agent over the str_replace or write_file tools, unless the change is very straightforward and obvious.`,
244252
hasCodeReviewer &&
245253
`- Spawn a code-reviewer agent to review the code changes after you have made them. You can skip this step for small changes that are obvious and don't require a review.`,
254+
hasCodeReviewerBestOfN &&
255+
`- Spawn a code-reviewer-best-of-n agent to review the code changes after you have made them. You can skip this step for small changes that are obvious and don't require a review.`,
246256
!hasNoValidation &&
247257
`- Test your changes${isMax ? '' : ' briefly'} by running appropriate validation commands for the project (e.g. typechecks, tests, lints, etc.).${isMax ? ' Start by type checking the specific area of the project that you are editing and then test the entire project if necessary.' : ' If you can, only typecheck/test the area of the project that you are editing, rather than the entire project.'} You may have to explore the project to find the appropriate commands. Don't skip this step!`,
248258
`- Inform the user that you have completed the task in one sentence or a few short bullet points.${isSonnet ? " Don't create any markdown summary files or example documentation files, unless asked by the user." : ''}`,
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
import { publisher } from '../constants'
2+
3+
import type { AgentStepContext, ToolCall } from '../types/agent-definition'
4+
import type { SecretAgentDefinition } from '../types/secret-agent-definition'
5+
6+
export function createCodeReviewerBestOfN(
7+
model: 'sonnet' | 'gpt-5',
8+
): Omit<SecretAgentDefinition, 'id'> {
9+
const isGpt5 = model === 'gpt-5'
10+
11+
return {
12+
publisher,
13+
model: isGpt5 ? 'openai/gpt-5' : 'anthropic/claude-sonnet-4.5',
14+
displayName: isGpt5
15+
? 'Best-of-N GPT-5 Code Reviewer'
16+
: 'Best-of-N Fast Code Reviewer',
17+
spawnerPrompt:
18+
'Reviews code by orchestrating multiple reviewer agents to generate review proposals, selects the best one, and provides the final review. Do not specify an input prompt for this agent; it reads the context from the message history.',
19+
20+
includeMessageHistory: true,
21+
inheritParentSystemPrompt: true,
22+
23+
toolNames: ['spawn_agents', 'set_messages', 'set_output'],
24+
spawnableAgents: isGpt5
25+
? ['code-reviewer-implementor-gpt-5', 'code-reviewer-selector-gpt-5']
26+
: ['code-reviewer-implementor', 'code-reviewer-selector'],
27+
28+
inputSchema: {
29+
params: {
30+
type: 'object',
31+
properties: {
32+
n: {
33+
type: 'number',
34+
description:
35+
'Number of parallel reviewer agents to spawn. Defaults to 5. Use fewer for simple reviews and max of 10 for complex reviews.',
36+
},
37+
},
38+
},
39+
},
40+
outputMode: 'structured_output',
41+
42+
handleSteps: isGpt5 ? handleStepsGpt5 : handleStepsSonnet,
43+
}
44+
}
45+
46+
function* handleStepsSonnet({
47+
agentState,
48+
params,
49+
}: AgentStepContext): ReturnType<
50+
NonNullable<SecretAgentDefinition['handleSteps']>
51+
> {
52+
const implementorAgent = 'code-reviewer-implementor'
53+
const selectorAgent = 'code-reviewer-selector'
54+
const n = Math.min(10, Math.max(1, (params?.n as number | undefined) ?? 5))
55+
56+
// Remove userInstruction message for this agent.
57+
const messages = agentState.messageHistory.concat()
58+
messages.pop()
59+
yield {
60+
toolName: 'set_messages',
61+
input: {
62+
messages,
63+
},
64+
includeToolCall: false,
65+
} satisfies ToolCall<'set_messages'>
66+
67+
const { toolResult: implementorsResult1 } = yield {
68+
toolName: 'spawn_agents',
69+
input: {
70+
agents: Array.from({ length: n }, () => ({
71+
agent_type: implementorAgent,
72+
})),
73+
},
74+
includeToolCall: false,
75+
} satisfies ToolCall<'spawn_agents'>
76+
77+
const implementorsResult = extractSpawnResults<string>(implementorsResult1)
78+
79+
// Extract all the reviews from the structured outputs
80+
const letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
81+
// Parse reviews from tool results
82+
const reviews = implementorsResult.map((content, index) => ({
83+
id: letters[index],
84+
content,
85+
}))
86+
87+
// Spawn selector with reviews as params
88+
const { toolResult: selectorResult } = yield {
89+
toolName: 'spawn_agents',
90+
input: {
91+
agents: [
92+
{
93+
agent_type: selectorAgent,
94+
params: { reviews },
95+
},
96+
],
97+
},
98+
includeToolCall: false,
99+
} satisfies ToolCall<'spawn_agents'>
100+
101+
const selectorOutput = extractSpawnResults<{
102+
reviewId: string
103+
reasoning: string
104+
}>(selectorResult)[0]
105+
106+
if ('errorMessage' in selectorOutput) {
107+
yield {
108+
toolName: 'set_output',
109+
input: { error: selectorOutput.errorMessage },
110+
} satisfies ToolCall<'set_output'>
111+
return
112+
}
113+
const { reviewId } = selectorOutput
114+
const chosenReview = reviews.find((review) => review.id === reviewId)
115+
if (!chosenReview) {
116+
yield {
117+
toolName: 'set_output',
118+
input: { error: 'Failed to find chosen review.' },
119+
} satisfies ToolCall<'set_output'>
120+
return
121+
}
122+
123+
// Set output with the chosen review and reasoning
124+
yield {
125+
toolName: 'set_output',
126+
input: {
127+
response: chosenReview.content,
128+
reasoning: selectorOutput.reasoning,
129+
},
130+
includeToolCall: false,
131+
} satisfies ToolCall<'set_output'>
132+
133+
function extractSpawnResults<T>(
134+
results: any[] | undefined,
135+
): (T | { errorMessage: string })[] {
136+
if (!results) return []
137+
const spawnedResults = results
138+
.filter((result) => result.type === 'json')
139+
.map((result) => result.value)
140+
.flat() as {
141+
agentType: string
142+
value: { value?: T; errorMessage?: string }
143+
}[]
144+
return spawnedResults.map(
145+
(result) =>
146+
result.value.value ?? {
147+
errorMessage:
148+
result.value.errorMessage ?? 'Error extracting spawn results',
149+
},
150+
)
151+
}
152+
}
153+
154+
function* handleStepsGpt5({
155+
agentState,
156+
params,
157+
}: AgentStepContext): ReturnType<
158+
NonNullable<SecretAgentDefinition['handleSteps']>
159+
> {
160+
const implementorAgent = 'code-reviewer-implementor-gpt-5'
161+
const selectorAgent = 'code-reviewer-selector-gpt-5'
162+
const n = Math.min(10, Math.max(1, (params?.n as number | undefined) ?? 5))
163+
164+
// Remove userInstruction message for this agent.
165+
const messages = agentState.messageHistory.concat()
166+
messages.pop()
167+
yield {
168+
toolName: 'set_messages',
169+
input: {
170+
messages,
171+
},
172+
includeToolCall: false,
173+
} satisfies ToolCall<'set_messages'>
174+
175+
const { toolResult: implementorsResult1 } = yield {
176+
toolName: 'spawn_agents',
177+
input: {
178+
agents: Array.from({ length: n }, () => ({
179+
agent_type: implementorAgent,
180+
})),
181+
},
182+
includeToolCall: false,
183+
} satisfies ToolCall<'spawn_agents'>
184+
185+
const implementorsResult = extractSpawnResults<string>(implementorsResult1)
186+
187+
// Extract all the reviews from the structured outputs
188+
const letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
189+
// Parse reviews from tool results
190+
const reviews = implementorsResult.map((content, index) => ({
191+
id: letters[index],
192+
content,
193+
}))
194+
195+
// Spawn selector with reviews as params
196+
const { toolResult: selectorResult } = yield {
197+
toolName: 'spawn_agents',
198+
input: {
199+
agents: [
200+
{
201+
agent_type: selectorAgent,
202+
params: { reviews },
203+
},
204+
],
205+
},
206+
includeToolCall: false,
207+
} satisfies ToolCall<'spawn_agents'>
208+
209+
const selectorOutput = extractSpawnResults<{
210+
reviewId: string
211+
reasoning: string
212+
}>(selectorResult)[0]
213+
214+
if ('errorMessage' in selectorOutput) {
215+
yield {
216+
toolName: 'set_output',
217+
input: { error: selectorOutput.errorMessage },
218+
} satisfies ToolCall<'set_output'>
219+
return
220+
}
221+
const { reviewId } = selectorOutput
222+
const chosenReview = reviews.find((review) => review.id === reviewId)
223+
if (!chosenReview) {
224+
yield {
225+
toolName: 'set_output',
226+
input: { error: 'Failed to find chosen review.' },
227+
} satisfies ToolCall<'set_output'>
228+
return
229+
}
230+
231+
// Set output with the chosen review and reasoning
232+
yield {
233+
toolName: 'set_output',
234+
input: {
235+
response: chosenReview.content,
236+
reasoning: selectorOutput.reasoning,
237+
},
238+
includeToolCall: false,
239+
} satisfies ToolCall<'set_output'>
240+
241+
function extractSpawnResults<T>(
242+
results: any[] | undefined,
243+
): (T | { errorMessage: string })[] {
244+
if (!results) return []
245+
const spawnedResults = results
246+
.filter((result) => result.type === 'json')
247+
.map((result) => result.value)
248+
.flat() as {
249+
agentType: string
250+
value: { value?: T; errorMessage?: string }
251+
}[]
252+
return spawnedResults.map(
253+
(result) =>
254+
result.value.value ?? {
255+
errorMessage:
256+
result.value.errorMessage ?? 'Error extracting spawn results',
257+
},
258+
)
259+
}
260+
}
261+
262+
const definition = {
263+
...createCodeReviewerBestOfN('sonnet'),
264+
id: 'code-reviewer-best-of-n',
265+
}
266+
export default definition

0 commit comments

Comments
 (0)