Skip to content

feat(sob): SOB benchmark, evaluation pipeline, leaderboard#1

Merged
Khurdhula-Harshavardhan merged 3 commits intomainfrom
feat/sob
Apr 28, 2026
Merged

feat(sob): SOB benchmark, evaluation pipeline, leaderboard#1
Khurdhula-Harshavardhan merged 3 commits intomainfrom
feat/sob

Conversation

@Abhinavexist
Copy link
Copy Markdown
Collaborator

Initial benchmark drop. CI builds leaderboard.json from data/evaluation/ and posts a top-10 preview comment.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

🏆 Leaderboard preview

Built 21 models, top 10 by Overall:

Rank Model Overall Val. Acc. JSON Pass Perfect
1 GPT-5.4 0.870 0.798 0.993 0.469
2 GLM-4.7 0.861 0.804 0.965 0.508
3 Qwen3.5-35B 0.861 0.801 0.969 0.500
4 Gemini-2.5-Flash 0.860 0.796 0.972 0.498
5 Qwen3-235B 0.857 0.786 0.978 0.463
6 Interfaze-Beta 0.855 0.795 0.967 0.480
7 Claude-Sonnet-4.6 0.854 0.779 0.979 0.442
8 GPT-4.1 0.850 0.783 0.969 0.454
9 GPT-5 0.849 0.769 0.983 0.398
10 Gemma-3-27B 0.847 0.777 0.969 0.454

Generated at 2026-04-28T08:23:42+00:00 • full JSON in workflow artifacts

@Khurdhula-Harshavardhan Khurdhula-Harshavardhan merged commit 2df1dbf into main Apr 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants