You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Full traces from each agent showing their step-by-step process
171
+
3. Performance metrics (scores, cost, time, errors)
172
+
173
+
## Focus on Agent Processes
174
+
175
+
Your analysis should focus on how agents work, not what they accomplished:
176
+
177
+
Key Analysis Areas:
178
+
- Problem-Solving Approach: How did each agent break down and approach the problem?
179
+
- Tool Usage Patterns: Which tools did they use, in what sequence, and why?
180
+
- Decision-Making Strategy: What information did they gather before acting? How did they validate assumptions?
181
+
- Workflow Efficiency: Did they follow a systematic process or jump around? Were steps logically ordered?
182
+
- Context Gathering: How thoroughly did they explore the codebase before making changes?
183
+
- Iterative Refinement: Did they test, verify, or refine their work? How?
184
+
185
+
## Output Format
180
186
181
187
Provide:
182
-
- **Overall Analysis**: Compare how agents performed on this task, analyzing their different approaches
183
-
- **Agent Feedback**: For each agent, list:
184
-
- Strengths: What this agent did well (specific actions from trace)
185
-
- Weaknesses: What this agent struggled with (specific issues from trace)
186
-
- Relative Performance: How this agent compared to others
187
-
- **Recommendations**: Actionable suggestions for improving the agents based on observed behavior
188
-
189
-
Focus on comparative insights - how agents differ in their approaches, tool usage patterns, efficiency, and results.
188
+
- Overall Analysis: Compare agent workflows, highlighting different process strategies
189
+
- Agent Feedback: For each agent:
190
+
- Strengths: Process steps that worked well (e.g., thoroughly explored codebase before editing)
191
+
- Weaknesses: Process gaps or inefficiencies (e.g., made changes without reading related files)
192
+
- Relative Performance: How this agent's process compared to others
193
+
- Recommendations: Generalizable improvements to agent workflows and decision-making processes
194
+
195
+
Important: Focus on the agent's process and methodology, not on the object-level content of the code changes. We want to understand how to improve the agent's approach to any problem.
196
+
190
197
Note: read_files tool results show [TRUNCATED] for file contents to save space.`,
191
198
}
192
199
@@ -208,24 +215,31 @@ export async function analyzeAgentTraces({
208
215
error: t.error,
209
216
}))
210
217
211
-
constprompt=`## Task Specification
218
+
constprompt=`## Task Specification (for context)
212
219
${spec}
213
220
214
221
## Agent Traces and Results
215
222
${JSON.stringify(truncatedTraces,null,2)}
216
223
217
-
Please compare these agents and provide:
218
-
1. An overall analysis of how the agents performed, including differences in their approaches
219
-
2. Specific feedback for each agent including strengths, weaknesses, and how they performed relative to others
0 commit comments