Add SWE-bench Verified patch environment by poofeth · Pull Request #1340 · PrimeIntellect-ai/verifiers

poofeth · 2026-05-11T13:52:31Z

Summary

Adds a swe-bench-verified environment for the Prime SWE-bench Verified bounty.

This environment:

loads princeton-nlp/SWE-bench_Verified from Hugging Face
formats each instance as a repository repair prompt
asks the model to return a unified diff inside <patch>...</patch> tags
preserves SWE-bench metadata (test_patch, FAIL_TO_PASS, PASS_TO_PASS, base commit, difficulty, setup commit) in task["info"]
provides deterministic local rewards/metrics for gold-patch exact match, patch similarity, submitted patch length, and gold patch length
documents that this is a patch-generation/SFT/RL sanity-check environment and not a replacement for the official execution harness

/claim https://algora.io/PrimeIntellect-ai/bounties/znaiNF1eEZzF62SD

Validation

uv run pytest tests/test_swe_bench_verified_environment.py -q
# 6 passed

uv run python - <<'PY'
import sys
from pathlib import Path
sys.path.insert(0, str(Path('environments/swe_bench_verified').resolve()))
import swe_bench_verified as env
loaded = env.load_environment(train_limit=1, eval_limit=1)
print(type(loaded).__name__, loaded.taskset.taskset_id)
PY
# Env swe-bench/verified

uv run ruff check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# All checks passed

uv run ruff format --check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# 2 files already formatted

git diff --check
# no output

uv run --with ./environments/swe_bench_verified python - <<'PY'
from swe_bench_verified import load_environment
loaded = load_environment(train_limit=1, eval_limit=1)
print(loaded.taskset.taskset_id)
PY
# swe-bench/verified

poofeth · 2026-05-11T13:53:17Z

/claim https://algora.io/PrimeIntellect-ai/bounties/znaiNF1eEZzF62SD

poofeth · 2026-05-11T13:56:34Z

Follow-up commit b0ea80a strengthens the environment handoff and metrics:

added official_submission(task, patch) for the official SWE-bench JSONL row shape (instance_id, model_patch)
added changed-file overlap as a deterministic metric alongside patch similarity and line counts
added tests for diff path extraction, official submission generation, and the new metric

Fresh validation:

uv run pytest tests/test_swe_bench_verified_environment.py -q
# 8 passed

uv run python - <<'PY'
import sys
from pathlib import Path
sys.path.insert(0, str(Path('environments/swe_bench_verified').resolve()))
import swe_bench_verified as env
loaded = env.load_environment(train_limit=1, eval_limit=1)
print(type(loaded).__name__, loaded.taskset.taskset_id)
PY
# Env swe-bench/verified

uv run ruff check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# All checks passed

uv run ruff format --check environments/swe_bench_verified tests/test_swe_bench_verified_environment.py
# 2 files already formatted

git diff --check
# no output

poofeth · 2026-05-11T14:18:16Z

Additional package-install smoke passed for the environment package metadata:

uv run --with ./environments/swe_bench_verified python - <<'PY'
from swe_bench_verified import load_environment, official_submission
loaded = load_environment(train_limit=1, eval_limit=1)
row = list(loaded.taskset.source())[0]
print(loaded.taskset.taskset_id, official_submission(row, row['answer'])['instance_id'])
PY
# swe-bench/verified astropy__astropy-12907

Add SWE-bench Verified environment

5c41b09

Improve SWE-bench Verified patch metrics

b0ea80a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SWE-bench Verified patch environment#1340

Add SWE-bench Verified patch environment#1340
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/swe-bench-verified-env

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poofeth commented May 11, 2026

Summary

Validation

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026

Uh oh!

poofeth commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant