Skip to content

Commit 57060c6

Browse files
committed
evals: only judge based on spec
1 parent bde7973 commit 57060c6

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

evals/git-evals/judge-git-eval.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ function buildAnalysisPrompt(
4949
)
5050
.join('\n\n')
5151

52-
return `You are an expert software engineer tasked with analyzing and scoring the code quality of changes made by an AI coding assistant (Codebuff). Please analyze the following interaction trace and compare both the attempted changes and the ground truth changes.
52+
return `You are an expert software engineer tasked with analyzing and scoring the code quality of changes made by an AI coding assistant (Codebuff). Please analyze and compare both the attempted changes and the ground truth changes.
5353
5454
[SPEC]
5555
${evalRun.eval_commit.spec}
@@ -75,6 +75,8 @@ Please analyze the implementation attempt and provide:
7575
- Code Quality: How well-structured, maintainable and idiomatic is the code?
7676
- Overall: Combined assessment of the implementation quality
7777
78+
Note: The agent only has access to the spec, so do not dock points for anything not included in the spec (e.g. unit tests, documentation, etc.). If something is included in the spec but not in the changes, you should give a lower score.
79+
7880
Focus on:
7981
- Correctness and completeness compared to the ground truth changes
8082
- Quality of the code produced

0 commit comments

Comments
 (0)