You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.html
+3Lines changed: 3 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -332,6 +332,8 @@ <h2 id="results">Results</h2>
332
332
<figcaptionstyle="text-align: center;">Table 3: Refactoring results for LIBRARIAN (w/ K = 8) averaged over 10 Code Contests collections</figcaption>
333
333
</figure>
334
334
335
+
We also present the results on the small repo split, which consists of repositories generated by o4-mini.
336
+
We experiment with Claude Sonnet 3.7 as a planner, and either Sonnet 3.7 as the implementer or o4-mini.
335
337
<figureclass="table-figure">
336
338
<tableclass="table-styled">
337
339
<thead>
@@ -362,6 +364,7 @@ <h2 id="results">Results</h2>
362
364
<figcaptionstyle="text-align: center;">Table 4: Average results on MiniCode-repositories small, using Codex with o4-mini and Claude Code with Claude Sonnet 3.7</figcaption>
363
365
</figure>
364
366
367
+
Finally, we present resulst on the large repo split. Due to the stronger performance of Sonnet models, we evaluate only Sonnet models to minimize cost.
0 commit comments