Add pandas-debugger RL environment for data pipeline debugging by l69d · Pull Request #1344 · PrimeIntellect-ai/verifiers

l69d · 2026-05-11T16:06:37Z

Summary

Adds a new pandas-debugger RL environment for debugging broken pandas/NumPy data pipelines.

What it does

The environment presents the model with a broken pandas/NumPy code snippet containing one of 10 bug categories and asks it to identify and fix the bug.

Bug categories (10)

off_by_one — index/slice off-by-one errors
dtype_cast — implicit type coercions producing NaN/wrong output
merge_key — wrong join key or join type
agg_axis — axis confusion in aggregations
fillna_method — wrong fill strategy
groupby_reset — missing reset_index() after groupby
str_strip — string whitespace issues corrupting matches
sort_ascending — wrong sort direction
inplace_return — inplace=True + assignment = None bug
copy_alias — view vs copy confusion causing silent no-ops

Why this fills a gap

None of the 37 existing environments cover data wrangling/debugging. Data engineering is the #1 ML practitioner pain point — an RL model that can reliably debug pandas pipelines has high practical value.

Reward signal

Graded: 0 / 0.25 / 0.5 / 1.0 — richer than binary for RL training signal.

Tests

39 unit tests across all categories and edge cases. No external services required — uses subprocess with timeout isolation.

Interested in the Application-Only tier if this qualifies. Happy to add more bug categories or increase task complexity based on feedback.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Note

Medium Risk
Introduces a new environment that executes model-produced Python in a subprocess for scoring; while isolated with timeouts, any code-execution harness changes carry moderate reliability/sandboxing risk.

Overview
Adds a new pandas-debugger environment that trains/evaluates models on fixing single-bug pandas/NumPy data-wrangling snippets, scoring via XML response parsing plus execution-based assertions.

Implements an embedded task bank (buggy vs fixed code + check_expr), a subprocess-based safe runner with timeouts, and a rubric combining correctness_reward, format_reward, and a lightweight reasoning keyword bonus.

Includes packaging metadata (pyproject.toml), documentation (README.md), and a comprehensive pytest suite validating task integrity, reward behavior, and end-to-end environment loading.

^{Reviewed by Cursor Bugbot for commit 899e4b8. Bugbot is set up for automated code reviews on this repo. Configure here.}

New environment covering 10 bug categories in pandas/numpy code: off_by_one, dtype_cast, merge_key, agg_axis, fillna_method, groupby_reset, str_strip, sort_ascending, inplace_return, copy_alias. Graded reward signal (0/0.25/0.5/1.0) for richer RL training signal. No sandbox required — subprocess with timeout isolation. 39 tests across all bug categories and edge cases.

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 899e4b8. Configure here.}

cursor · 2026-05-11T16:13:51Z

+        # our check_expr itself is broken; give benefit of the doubt
+        return 0.5
+
+    return score


Reward ladder 0.5 level never triggers for intended case

High Severity

The documented reward ladder says 0.5 = "runs but wrong," but the implementation never returns 0.5 for that case. When the model's code runs without errors but fails the check, it returns 0.25 (same as "syntactically valid but crashes"). The 0.5 path only triggers when the ground-truth check_expr is itself broken, which is a degenerate sanity-check scenario. The advertised 4-level graded signal (0/0.25/0.5/1.0) is effectively 3-level (0/0.25/1.0), reducing the RL training signal richness that the PR specifically highlights as a design goal.

Additional Locations (1)

environments/pandas_debugger/README.md#L58-L59

^{Reviewed by Cursor Bugbot for commit 899e4b8. Configure here.}

cursor · 2026-05-11T16:13:51Z

@@ -0,0 +1,645 @@
+"""


Missing environments/README.md update for new environment

Low Severity

This PR adds a new pandas_debugger environment to the environments/ folder but does not update environments/README.md to list it. The project rules require that any PR adding or removing an environment must update that README with the new environment listed under the appropriate category/pattern section.

^{Triggered by project rule: BugBot Instructions}

^{Reviewed by Cursor Bugbot for commit 899e4b8. Configure here.}

cursor · 2026-05-11T16:13:51Z

+    }
+    patterns = keywords.get(bug_type, [])
+    text_lower = text.lower()
+    return any(re.search(p, text_lower) for p in patterns)


Regex patterns with uppercase never match lowered text

Low Severity

The patterns "None" and "SettingWithCopy" contain uppercase characters but are matched against text.lower() via re.search, which is case-sensitive by default. These patterns can never match, making them dead code. The function still works in practice because sibling patterns like "inplace" / "return" and "copy" / "view" compensate, but the intended matching for those specific terms is silently broken.

Additional Locations (1)

environments/pandas_debugger/pandas_debugger.py#L473-L474

^{Reviewed by Cursor Bugbot for commit 899e4b8. Configure here.}

cursor · 2026-05-11T16:13:51Z

+        return 1.0
+
+    # Partial: also check if the GROUND TRUTH passes (sanity; should always be True)
+    gt_passed, _ = _run_code_safe(fixed_code_gt, check_expr)


Blocking subprocess.run in async reward function serializes rollouts

Medium Severity

The async correctness_reward function calls _run_code_safe, which uses synchronous subprocess.run with a 10-second timeout — potentially twice per rollout (model code + ground-truth sanity check). The project docs explicitly state that sync operations in reward functions block all concurrent rollouts and "should be avoided at all costs." Using asyncio.create_subprocess_exec or wrapping with asyncio.to_thread would prevent event-loop starvation.

Additional Locations (1)

environments/pandas_debugger/pandas_debugger.py#L425-L431

^{Reviewed by Cursor Bugbot for commit 899e4b8. Configure here.}

cursor Bot reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pandas-debugger RL environment for data pipeline debugging#1344

Add pandas-debugger RL environment for data pipeline debugging#1344
l69d wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
l69d:add-pandas-debugger-env

l69d commented May 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 11, 2026

Uh oh!

cursor Bot May 11, 2026

Uh oh!

cursor Bot May 11, 2026

Uh oh!

cursor Bot May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l69d commented May 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Bug categories (10)

Why this fills a gap

Reward signal

Tests

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 11, 2026

Choose a reason for hiding this comment

Reward ladder 0.5 level never triggers for intended case

Uh oh!

cursor Bot May 11, 2026

Choose a reason for hiding this comment

Missing environments/README.md update for new environment

Uh oh!

cursor Bot May 11, 2026

Choose a reason for hiding this comment

Regex patterns with uppercase never match lowered text

Uh oh!

cursor Bot May 11, 2026

Choose a reason for hiding this comment

Blocking subprocess.run in async reward function serializes rollouts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

l69d commented May 11, 2026 •

edited by cursor Bot

Loading