Skip to content

Add SWE-bench Verified patch environment#1340

Open
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/swe-bench-verified-env
Open

Add SWE-bench Verified patch environment#1340
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/swe-bench-verified-env

Conversation

@poofeth
Copy link
Copy Markdown

@poofeth poofeth commented May 11, 2026

Summary

Adds a swe-bench-verified environment for the Prime SWE-bench Verified bounty.

This environment:

  • loads princeton-nlp/SWE-bench_Verified from Hugging Face
  • formats each instance as a repository repair prompt
  • asks the model to return a unified diff inside <patch>...</patch> tags
  • preserves SWE-bench metadata (test_patch, FAIL_TO_PASS, PASS_TO_PASS, base commit, difficulty, setup commit) in task["info"]
  • provides deterministic local rewards/metrics for gold-patch exact match, patch similarity, submitted patch length, and gold patch length
  • documents that this is a patch-generation/SFT/RL sanity-check environment and not a replacement for the official execution harness

/claim https://algora.io/PrimeIntellect-ai/bounties/znaiNF1eEZzF62SD

Validation

uv run pytest tests/test_swe_bench_verified_environment.py -q
# 6 passed

uv run python - <<'PY'
import sys
from pathlib import Path
sys.path.insert(0, str(Path('environments/swe_bench_verified').resolve()))
import swe_bench_verified as env
loaded = env.load_environment(train_limit=1, eval_limit=1)
print(type(loaded).__name__, loaded.taskset.taskset_id)
PY
# Env swe-bench/verified

uv run ruff check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# All checks passed

uv run ruff format --check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# 2 files already formatted

git diff --check
# no output

uv run --with ./environments/swe_bench_verified python - <<'PY'
from swe_bench_verified import load_environment
loaded = load_environment(train_limit=1, eval_limit=1)
print(loaded.taskset.taskset_id)
PY
# swe-bench/verified

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Follow-up commit b0ea80a strengthens the environment handoff and metrics:

  • added official_submission(task, patch) for the official SWE-bench JSONL row shape (instance_id, model_patch)
  • added changed-file overlap as a deterministic metric alongside patch similarity and line counts
  • added tests for diff path extraction, official submission generation, and the new metric

Fresh validation:

uv run pytest tests/test_swe_bench_verified_environment.py -q
# 8 passed

uv run python - <<'PY'
import sys
from pathlib import Path
sys.path.insert(0, str(Path('environments/swe_bench_verified').resolve()))
import swe_bench_verified as env
loaded = env.load_environment(train_limit=1, eval_limit=1)
print(type(loaded).__name__, loaded.taskset.taskset_id)
PY
# Env swe-bench/verified

uv run ruff check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# All checks passed

uv run ruff format --check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# 2 files already formatted

git diff --check
# no output

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Additional package-install smoke passed for the environment package metadata:

uv run --with ./environments/swe_bench_verified python - <<'PY'
from swe_bench_verified import load_environment, official_submission
loaded = load_environment(train_limit=1, eval_limit=1)
row = list(loaded.taskset.source())[0]
print(loaded.taskset.taskset_id, official_submission(row, row['answer'])['instance_id'])
PY
# swe-bench/verified astropy__astropy-12907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant