Skip to content

Add EQ-Bench3 synthetic training environment#1335

Open
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/eq-bench3-train-set
Open

Add EQ-Bench3 synthetic training environment#1335
poofeth wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
poofeth:bounty/eq-bench3-train-set

Conversation

@poofeth
Copy link
Copy Markdown

@poofeth poofeth commented May 11, 2026

Summary

  • add an eq_bench3 environment for EQ-Bench-style emotional intensity prediction
  • include a deterministic generator for uncontaminated synthetic dialogue/emotion prompts
  • commit a 64-row HF Dataset-compatible JSONL sample with question, answer, and info fields
  • add deterministic scoring for JSON emotion-intensity predictions
  • add focused tests covering generator determinism, dataset loading, reward scoring, and environment construction without external API calls

Bounty

Validation

  • uv run pytest tests/test_eq_bench3_environment.py -q
  • uv run ruff check environments/eq_bench3 tests/test_eq_bench3_environment.py
  • uv run ruff format --check environments/eq_bench3 tests/test_eq_bench3_environment.py
  • git diff --check

Note

Low Risk
Low risk: changes are additive (new eq_bench3 environment, sample data, and tests) with no modifications to core framework or security-sensitive code paths.

Overview
Adds a new environments/eq_bench3 SingleTurn environment for EQ-Bench-style emotion-intensity prediction, including JSON parsing + a continuous reward (emotion_score_reward) that scores per-emotion intensity error.

Includes a deterministic prompt generator script and commits a 64-row synthetic JSONL sample dataset, plus packaging metadata (pyproject.toml) and focused tests for generator determinism, dataset loading, reward correctness, and load_environment construction.

Updates environments/README.md to list eq_bench3 among available SingleTurn examples.

Reviewed by Cursor Bugbot for commit 32cf899. Bugbot is set up for automated code reviews on this repo. Configure here.

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Validation evidence for the EQ-Bench3 bounty PR:

  • uv run pytest tests/test_eq_bench3_environment.py -q passed: 4 tests.
  • uv run ruff check environments/eq_bench3 tests/test_eq_bench3_environment.py passed.
  • uv run ruff format --check environments/eq_bench3 tests/test_eq_bench3_environment.py passed.
  • git diff --check passed.

The sample dataset is generated from local deterministic templates (source=synthetic-uncontaminated-v1) rather than copied from the upstream EQ-Bench question set.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1698173. Configure here.

Comment thread environments/eq_bench3/README.md
Comment thread environments/eq_bench3/generate_eq_bench3_prompts.py
@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Addressed the Bugbot review in commit 32cf899:

  • added eq_bench3 to environments/README.md
  • added an upper-bound guard for synthetic generation (num_examples <= 96) to avoid duplicate-exhaustion loops
  • added regression coverage for the generator bound

Validation after the fixes:

$ uv run pytest tests/test_eq_bench3_environment.py -q
.....                                                                    [100%]

$ uv run ruff check environments/eq_bench3 tests/test_eq_bench3_environment.py
All checks passed!

$ uv run ruff format --check environments/eq_bench3 tests/test_eq_bench3_environment.py
3 files already formatted

$ git diff --check
# no output

@poofeth
Copy link
Copy Markdown
Author

poofeth commented May 11, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant