add prompting for more full completion of the spec

charleslien · charleslien · commit bde797311177 · 2025-09-04T13:31:17.000-07:00
diff --git a/evals/git-evals/judge-git-eval.ts b/evals/git-evals/judge-git-eval.ts
@@ -67,7 +67,7 @@ ${codebuffChanges}
 ${evalRun.error ? evalRun.error : 'None'}
 [/ERROR]
 
-Please analyze the trace of the implementation attempt and provide:
+Please analyze the implementation attempt and provide:
 1. A detailed analysis of the implementation trace and the final changes. Include how the changes compare to the ground truth change. Does it have similar behavior at least?
 2. Key strengths and weaknesses of the implementation
 3. Numerical scores (0-10):
diff --git a/evals/git-evals/run-git-evals.ts b/evals/git-evals/run-git-evals.ts
@@ -134,7 +134,7 @@ Note that files can only be changed with tools. If no tools are called, no files
 
 You must decide whether to:
 1. 'continue' - Generate a follow-up prompt for Codebuff
-2. 'complete' - The implementation is done and satisfies the spec
+2. 'complete' - The implementation is done and fully satisfies the spec, including tests, documentation, and any other relevant artifacts
 3. 'halt' - The implementation is off track and unlikely to be completed within ${MAX_ATTEMPTS - attempts} more attempts
 
 If deciding to continue, include a clear, focused prompt for Codebuff in next_prompt. Note that Codebuff does not have access to the spec, so you must describe the changes you want Codebuff to make in a way that is clear and concise.