Thanks for contributing.
This project is a research-style codebase, so reproducibility and clear experiment hygiene matter more than volume of code changes.
make setup
make check
make testIf you work on GPU paths, install GPU extras:
make setup-gpu- Create a branch from
main. - Keep changes scoped to one concern (feature, fix, docs, or results).
- Run checks before opening a PR.
- Include rationale in commit messages and PR description.
make check
make testFor benchmark-related changes, also run at least one smoke benchmark:
make bench-toy OUT=/tmp/bench_toy.jsonl
.venv/bin/python scripts/validate_results_jsonl.py --path /tmp/bench_toy.jsonl --strictdatasets/is for local data and run outputs (gitignored by default).reports/contains tracked summary artifacts intended for sharing.logs/is local runtime noise and is ignored.- Prefer new output filenames for long runs to avoid mixing old and new records.
- Keep paper-aligned C-grid:
1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0. - Do not introduce
1e1or1e2into preset defaults. - If behavior changes, update tests in
tests/test_autojudge.py.
- Follow existing style in touched files.
- Keep interfaces explicit; avoid hidden behavior changes.
- Add comments only for non-obvious logic.
- Scope is clear and focused
-
make checkpasses -
make testpasses - Relevant docs/configs updated
- Added or updated tests for logic changes
- Benchmark outputs validated when applicable
Please use the bug report template and include:
- exact command used,
- dataset/model preset,
- stack trace or log snippet,
- expected vs actual behavior.