Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
aab6d92
bunch of stuff
jahooma Oct 8, 2025
8519dba
Merge branch 'main' into base-layers
jahooma Oct 8, 2025
86efd8f
Initial base layer
jahooma Oct 8, 2025
e39e92c
Param to run git evals on just spec instead of prompting agent
jahooma Oct 8, 2025
aca43ff
code-drafter agent
jahooma Oct 8, 2025
f84d63b
Merge branch 'main' into base-layers
jahooma Oct 8, 2025
f4e0984
create codebase explorer, directory-lister, glob-matcher. consolidate…
jahooma Oct 8, 2025
9596ffe
Gimp the read-only-commander
jahooma Oct 8, 2025
91dca8b
Update codebase-explorer, add file-q-and-a agent
jahooma Oct 8, 2025
b17ad19
Create find-all-referencer agent. Update base-layer
jahooma Oct 8, 2025
210d41d
Add decomposing reviewer + tweaks
jahooma Oct 8, 2025
e3e1f3e
Thinker can propose context that should be acquired
jahooma Oct 9, 2025
17e3f33
Add code sketcher
jahooma Oct 9, 2025
1cce0bc
[buffbench] base-layer with iterative planner; spec all at once
jahooma Oct 9, 2025
0f52f28
[buffbench] base-layer without iterative planner; eval prompt is the …
jahooma Oct 9, 2025
c85e7d3
[buffbench] base-layer no iterative planner, prompted with spec
jahooma Oct 9, 2025
3c8d0f1
[buffbench] base-layer with iterative planner; eval prompt is the ful…
jahooma Oct 9, 2025
c5fc672
[buffbench] [buffbench] base-layer with iterative planner; eval promp…
jahooma Oct 9, 2025
b2a6cdb
Merge branch 'main' into base-layers
jahooma Oct 9, 2025
417d6db
evals: remove trace file, we actually already save in logs
jahooma Oct 9, 2025
1237d31
base-layer: Disable iterative-planner
jahooma Oct 9, 2025
afb2e97
delete old base 2, rename base-layer to base2
jahooma Oct 9, 2025
0dc3163
delete log files
jahooma Oct 9, 2025
f26f29d
delete log file
jahooma Oct 9, 2025
aa93f2a
delete researcher-file-explorer
jahooma Oct 9, 2025
95196ac
Delete inline-file-explorer
jahooma Oct 9, 2025
9efe7cc
Put back run eval set to initial config
jahooma Oct 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 43 additions & 31 deletions .agents/base2/base2.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ const definition: SecretAgentDefinition = {
id: 'base2',
publisher,
model: 'anthropic/claude-sonnet-4.5',
displayName: 'Orchestrator',
displayName: 'Buffy the Orchestrator',
spawnerPrompt:
'Advanced base agent that orchestrates planning, editing, and reviewing for complex coding tasks',
inputSchema: {
Expand All @@ -28,16 +28,17 @@ const definition: SecretAgentDefinition = {
},
outputMode: 'last_message',
includeMessageHistory: true,
toolNames: ['spawn_agents', 'read_files', 'code_search'],
toolNames: ['spawn_agents', 'read_files'],
spawnableAgents: [
'read-only-commander',
'researcher-file-explorer',
'file-explorer',
'find-all-referencer',
'researcher-web',
'researcher-docs',
'read-only-commander',
'decomposing-thinker',
'decomposing-planner',
'code-sketcher',
'editor',
'reviewer-max',
'reviewer',
'context-pruner',
],

Expand All @@ -46,13 +47,12 @@ const definition: SecretAgentDefinition = {
# Core Mandates

- **Tone:** Adopt a professional, direct, and concise tone suitable for a CLI environment.
- **Orchestrate only** Coordinate between agents but do not implement code yourself.
- **Rely on agents** Ask your spawned agents to complete a whole task. Instead of asking to see each relevant file and building up the plan yourself, ask an agent to come up with a plan or do the task or at least give you higher level information than what each section of code is. You shouldn't be trying to read each section of code yourself.
- **Give as many instructions upfront as possible** When spawning agents, write a prompt that includes all your instructions for each agent so you don't need to spawn them again.
- **Spawn mentioned agents:** If the users uses "@AgentName" in their message, you must spawn that agent. Spawn all the agents that the user mentions.
- **Be concise:** Do not write unnecessary introductions or final summaries in your responses. Be concise and focus on efficiently completing the user's request, without adding explanations longer than 1 sentence.
- **No final summary:** Never write a final summary of what work was done when the user's request is complete. Instead, inform the user in one sentence that the task is complete.
- **Clarity over Brevity (When Needed):** While conciseness is key, prioritize clarity for essential explanations or when seeking necessary clarification if a request is ambiguous.
- **Orchestrate only:** Coordinate between agents but do not implement code yourself.
- **Understand first, act second:** Always gather context and read relevant files BEFORE spawning editors.
- **Quality over speed:** Prioritize correctness over appearing productive. Fewer, well-informed agents are better than many rushed ones.
- **Spawn mentioned agents:** If the user uses "@AgentName" in their message, you must spawn that agent.
- **No final summary:** When the task is complete, inform the user in one sentence.
- **Validate assumptions:** Use researchers, file pickers, and the read_files tool to verify assumptions about libraries and APIs before implementing.
- **Proactiveness:** Fulfill the user's request thoroughly, including reasonable, directly implied follow-up actions.
- **Confirm Ambiguity/Expansion:** Do not take significant actions beyond the clear scope of the request without confirming with the user. If asked *how* to do something, explain first, don't just do it.

Expand All @@ -68,30 +68,42 @@ ${PLACEHOLDER.GIT_CHANGES_PROMPT}

instructionsPrompt: `Orchestrate the completion of the user's request using your specialized sub-agents.

## Example workflow
You spawn agents in "layers". Each layer is one spawn_agents tool call composed of multiple agents that answer your questions, do research, think, edit, and review.

In between layers, you are encouraged to use the read_files tool to read files that you think are relevant to the user's request.

Continue to spawn layers of agents until have completed the user's request or require more information from the user.

## Example layers

The user asks you to implement a new feature. You respond in multiple steps:

1. Spawn a file explorer with different prompts to find relevant files; spawn a find-all-referencer to find more relevant files and answer questions about the codebase; spawn 1 docs research to find relevant docs;
1a. Read all the relevant files using the read_files tool.
2. Spawn one more file explorer and one more find-all-referencer with different prompts to find relevant files; spawn a decomposing thinker with questions on a key decision; spawn a decomposing thinker to plan out the feature part-by-part. Spawn a code sketcher to sketch out one key section of the code that is the most important or difficult.
2a. Read all the relevant files using the read_files tool.
3. Spawn a decomposing-thinker to think about remaining key decisions; spawn one more code sketcher to sketch another key section.
4. Spawn two editors to implement all the changes.
5. Spawn a reviewer to review the changes made by the editors.

Use this workflow to solve a medium or complex coding task:
1. Spawn relevant researchers in parallel (researcher-file-explorer, researcher-web, researcher-docs)
2. Read all the relevant files using the read_files tool.
3. Repeat steps 1 and/or 2 until you have all the information you could possibly need to complete the task. You should aim to read as many files as possible, up to 20+ files to have broader codebase context.
4. Spawn a decomposing planner to come up with a plan.
5. Spawn an editor to implement the plan. If there are totally disjoint parts of the plan, you can spawn multiple editors to implement each part in parallel.
6. Spawn a reviewer to review the changes made by the editor. If more changes are needed, go back to step 5, but no more than once.
7. You must stop before spawning too many sequential agents, because that this takes too much time and the user will get impatient.

Feel free to modify this workflow as needed. It's good to spawn different agents in sequence: spawn a researcher before a planner because then the planner can use the researcher's results to come up with a better plan. You can however spawn mulitple researchers, planners, editors, and read-only-commanders, at the same time if needed.
## Spawning agents guidelines

## Guidelines
- **Sequence agents properly:** Keep in mind dependencies when spawning different agents:
- Spawn file explorers, find-all-referencer, and researchers before thinkers because then the thinkers can use the file/research results to come up with a better conclusions
- Spawn thinkers before editors so editors can use the insights from the thinkers.
- Reviewers should be spawned after editors.
- **Use the decomposing thinker also to check what context you are missing:** Ask what context you don't have for specific subtasks that you should could still acquire (with file pickers or find-all-referencers or researchers or using the read_files tool). Getting more context is one of the most important things you should do before planning or editing or coding anything.
- **Once you've gathered all the context you need, create a plan:** Write out your plan as a bullet point list. The user wants to see you write out your plan so they know you are on track.
- **Spawn editors later** Only spawn editors after gathering all the context and creating a plan.
- **No need to include context:** When prompting an agent, realize that many agents can already see the entire conversation history, so you can be brief in prompting them without needing to include context.

- Spawn agents to help you complete the task. Iterate by spawning more agents as needed.
- Don't mastermind the task. Rely on your agents' judgement to research, plan, edit, and review the code.
- You should feel free to stop and ask the user for guidance if you're stuck or don't know what to try next, or need a clarification.
- Give as many instructions upfront as possible to each agent so you're less likely to need to spawn them again.
- When prompting an agent, realize that many agents can already see the entire conversation history, so you can be brief in prompting them without needing to include context.
- Be careful about instructing subagents to run terminal commands that could be destructive or have effects that are hard to undo (e.g. git push, running scripts that could alter production environments, installing packages globally, etc). Don't do any of these unless the user explicitly asks you to.
## General guidelines
- **Stop and ask for guidance:** You should feel free to stop and ask the user for guidance if you're stuck or don't know what to try next, or need a clarification.
- **Be careful about terminal commands:** Be careful about instructing subagents to run terminal commands that could be destructive or have effects that are hard to undo (e.g. git push, running scripts that could alter production environments, installing packages globally, etc). Don't do any of these unless the user explicitly asks you to.
`,

stepPrompt: `Don't forget to spawn agents that could help, especially: the researcher-file-explorer to get codebase context, the decomposing-planner to craft a great plan, and the reviewer-max to review code changes made by the editor.`,
stepPrompt: `Don't forget to spawn agents that could help, especially: the file-explorer and find-all-referencer to get codebase context, the decomposing thinker to think about key decisions, the code sketcher to sketch out the key sections of code, and the reviewer/decomposing-reviewer to review code changes made by the editor(s).`,

handleSteps: function* ({ prompt, params }) {
let steps = 0
Expand Down
56 changes: 56 additions & 0 deletions .agents/editor/code-sketcher.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import { publisher } from '../constants'
import { type SecretAgentDefinition } from '../types/secret-agent-definition'

const definition: SecretAgentDefinition = {
id: 'code-sketcher',
displayName: 'Code Sketcher',
publisher,
model: 'anthropic/claude-sonnet-4.5',
spawnerPrompt:
'Spawn to sketch the code that will be needed to accomplish the task, focusing on the the key sections of logic or interfaces. Cannot use tools to edit files - instead describes all changes using markdown code blocks. Does not spawn other agents.',
inputSchema: {
prompt: {
type: 'string',
description: 'The coding task to sketch out, including the key sections of logic or interfaces it should focus on.',
},
},
outputMode: 'last_message',
includeMessageHistory: true,
inheritParentSystemPrompt: true,
toolNames: [],
spawnableAgents: [],

instructionsPrompt: `You are an expert programmer who sketches out the code that will be needed to accomplish the task.

You do not have access to tools to modify files. Instead, you describe all code changes using markdown code blocks.

Instructions:
- Think about the best way to accomplish the task
- Write out the sketch for each file that needs to be changed
- Use markdown code blocks with the file path as the language identifier
- For each file, show the only the code changes needed, don't include the entire file

Important: Focus on the key sections of logic or interfaces that are needed to accomplish the task! You don't need to sketch out the more obvious parts of the code.
You can skip over parts of the code using psuedo code or placeholder comments.

Guidelines:
- Pay close attention to the user's request and address all requirements
- Focus on the simplest solution that accomplishes the task
- Reuse existing code patterns and conventions from the codebase
- Keep naming consistent with the existing codebase
- Try not to modify more files than necessary
- Avoid comments unless absolutely necessary to understand the code
- Do not add try/catch blocks unless needed
- Do not write duplicate code that could use existing helpers

Format your response with file blocks, like this:
path/to/file.ts
\`\`\`typescript
// ... existing code ...
[this is is the key section of code]
// ... existing code ...
\`\`\`
`,
}

export default definition
79 changes: 79 additions & 0 deletions .agents/file-explorer/code-searcher.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import { publisher } from '../constants'

import type { SecretAgentDefinition } from '../types/secret-agent-definition'

interface SearchQuery {
pattern: string
flags?: string
cwd?: string
maxResults?: number
}

const paramsSchema = {
type: 'object' as const,
properties: {
searchQueries: {
type: 'array' as const,
items: {
type: 'object' as const,
properties: {
pattern: {
type: 'string' as const,
description: 'The pattern to search for',
},
flags: {
type: 'string' as const,
description:
'Optional ripgrep flags to customize the search (e.g., "-i" for case-insensitive, "-t ts" for TypeScript files only, "-A 3" for 3 lines after match, "-B 2" for 2 lines before match, "--type-not test" to exclude test files)',
},
cwd: {
type: 'string' as const,
description:
'Optional working directory to search within, relative to the project root. Defaults to searching the entire project',
},
maxResults: {
type: 'number' as const,
description:
'Maximum number of results to return per file. Defaults to 15. There is also a global limit of 250 results across all files',
},
},
required: ['pattern'],
},
description: 'Array of code search queries to execute',
},
},
required: ['searchQueries'],
}

const codeSearcher: SecretAgentDefinition = {
id: 'code-searcher',
displayName: 'Code Searcher',
spawnerPrompt:
'Mechanically runs multiple code search queries (using ripgrep line-oriented search) and returns all results',
model: 'anthropic/claude-sonnet-4.5',
publisher,
outputMode: 'all_messages',
includeMessageHistory: false,
toolNames: ['code_search'],
spawnableAgents: [],
inputSchema: {
params: paramsSchema,
},
handleSteps: function* ({ params }) {
const searchQueries: SearchQuery[] = params?.searchQueries ?? []

for (const query of searchQueries) {
yield {
toolName: 'code_search',
input: {
pattern: query.pattern,
flags: query.flags,
cwd: query.cwd,
maxResults: query.maxResults,
},
}
}
},
}

export default codeSearcher
40 changes: 40 additions & 0 deletions .agents/file-explorer/codebase-explorer.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import { publisher } from '../constants'

import type { SecretAgentDefinition } from '../types/secret-agent-definition'

const codebaseExplorer: SecretAgentDefinition = {
id: 'codebase-explorer',
displayName: 'Codebase Explorer',
spawnerPrompt:
'Orchestrates multiple exploration agents to comprehensively analyze the codebase and answer questions.',
model: 'anthropic/claude-sonnet-4.5',
publisher,
outputMode: 'last_message',
includeMessageHistory: false,
toolNames: ['spawn_agents'],
spawnableAgents: [
'file-picker',
'code-searcher',
'directory-lister',
'glob-matcher',
'file-q-and-a',
],
inputSchema: {
prompt: {
type: 'string',
description: 'A question or exploration goal for the codebase.',
},
},
systemPrompt: `You are a codebase exploration orchestrator. Your job is to spawn multiple specialized agents in parallel waves to comprehensively explore the codebase and answer the user's question.

Strategy:
1. Analyze the user's question to determine what exploration approach would be most effective.
2. You may spawn agents to help you answer the user's question. Feel free to spawn multiple agents in parallel to gather information from different angles.
3. Synthesize all findings into a comprehensive answer.`,

instructionsPrompt: `Analyze the user's prompt and spawn appropriate exploration agents.

Finally, synthesize all findings into a comprehensive answer.`,
}

export default codebaseExplorer
55 changes: 55 additions & 0 deletions .agents/file-explorer/directory-lister.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import { publisher } from '../constants'

import type { SecretAgentDefinition } from '../types/secret-agent-definition'

interface ListDirectoryQuery {
path: string
}

const paramsSchema = {
type: 'object' as const,
properties: {
directories: {
type: 'array' as const,
items: {
type: 'object' as const,
properties: {
path: { type: 'string' as const },
},
required: ['path'],
},
description: 'Array of directory paths to list',
},
},
required: ['directories'],
}

const directoryLister: SecretAgentDefinition = {
id: 'directory-lister',
displayName: 'Directory Lister',
spawnerPrompt:
'Mechanically lists multiple directories and returns their contents',
model: 'anthropic/claude-sonnet-4.5',
publisher,
outputMode: 'all_messages',
includeMessageHistory: false,
toolNames: ['list_directory'],
spawnableAgents: [],
inputSchema: {
params: paramsSchema,
},
handleSteps: function* ({ params }) {
const directories: ListDirectoryQuery[] = params?.directories ?? []

for (const directory of directories) {
yield {
toolName: 'list_directory',
input: {
path: directory.path,
},
}
}
},
}

export default directoryLister
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import { AgentTemplateTypes } from '@codebuff/common/types/session-state'

import { publisher } from './constants'
import { publisher } from '../constants'

import type { SecretAgentDefinition } from './types/secret-agent-definition'
import type { SecretAgentDefinition } from '../types/secret-agent-definition'

const paramsSchema = {
type: 'object' as const,
Expand All @@ -22,7 +22,7 @@ const fileExplorer: SecretAgentDefinition = {
displayName: 'Dora the File Explorer',
spawnerPrompt:
'Comprehensively explores the codebase and reports back on the results',
model: 'anthropic/claude-4-sonnet-20250522',
model: 'x-ai/grok-4-fast',
publisher,
outputMode: 'structured_output',
includeMessageHistory: false,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import { publisher } from '../constants'
import { filePicker } from 'factory/file-picker'
import { SecretAgentDefinition } from 'types/secret-agent-definition'
import { filePicker } from '../factory/file-picker'

import type { SecretAgentDefinition } from '../types/secret-agent-definition'

const definition: SecretAgentDefinition = {
...filePicker('x-ai/grok-4-fast'),
id: 'researcher-file-picker',
id: 'file-picker',
publisher,
...filePicker('x-ai/grok-4-fast'),
}

export default definition
Loading