Skip to content

Simplify v1 taskset and harness TOML config#1314

Open
xeophon wants to merge 4 commits intomainfrom
v1-config-aliases
Open

Simplify v1 taskset and harness TOML config#1314
xeophon wants to merge 4 commits intomainfrom
v1-config-aliases

Conversation

@xeophon
Copy link
Copy Markdown
Member

@xeophon xeophon commented May 8, 2026

Summary

  • allow v1 TOML to use top-level [harness] defaults plus per-env/per-eval taskset and harness tables
  • keep v1 taskset/harness config first-class in eval config instead of normalizing it into env_args
  • bridge v1 config into config={...} only at environment load time, preserving existing v1 loaders
  • update docs and tests for the new TOML shape

Tests

  • uv run pytest tests/test_eval_cli.py tests/test_v1_config_aliases.py tests/test_eval_display.py tests/test_eval_utils.py -q
  • pre-commit hooks on commit
  • pre-push hooks on push

Note

Medium Risk
Touches TOML normalization and environment loading for both eval and RL training; mis-merges could change harness/taskset behavior or break existing configs, though changes are localized and covered by new tests.

Overview
Simplifies v1 TOML config by making taskset/harness first-class sections on eval/env entries while allowing a shared top-level [harness] table to act as defaults across all [[eval]]/[[env]] blocks.

Updates config normalization to merge legacy env_args.config + per-entry overrides + global harness defaults, and delays bridging into config={taskset,harness} until vf.load_environment(...) is called (eval runner and vf-train). Docs and tests are updated to reflect the new TOML shape and precedence.

Reviewed by Cursor Bugbot for commit ed2fa56. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e47bf9736

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/utils/eval_utils.py Outdated
Comment on lines +789 to +790
if v1_config:
env_kwargs["config"] = v1_config
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve existing env_args.config when applying aliases

When a config mixes the previously documented [eval.env_args.config...] shape with the new top-level/per-eval harness or taskset aliases, this assignment replaces the entire existing config object instead of merging into it. For example, adding a top-level [harness] default to an eval that still has [eval.env_args.config.taskset] silently drops the taskset before vf.load_environment(...), so v1 loaders receive incomplete configuration.

Useful? React with 👍 / 👎.

Comment on lines +39 to +40
if v1_config:
env_kwargs["config"] = v1_config
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Merge aliases with env args config in vf-train

For vf-train configs that still use the old [env.args.config...] shape, adding a new top-level [harness] or [env.taskset] alias causes this assignment to overwrite args.config entirely. That makes mixed/migrating v1 training configs lose any existing taskset/harness settings before vf.load_environment(...) is called.

Useful? React with 👍 / 👎.

Comment thread verifiers/utils/eval_utils.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b258188. Configure here.

Comment thread verifiers/utils/eval_utils.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed2fa56e13

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
env_args["config"] = {**existing_config, **child_config}

legacy_config = config_table(env_args.pop("config", {}), "env_args.config")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-v1 config kwargs

When a TOML config uses env_args.config as an ordinary load_environment(config=...) kwarg with keys other than taskset or harness, this pop removes that kwarg and the remaining legacy_config entries are never put back into either env_args or the first-class config. For example [eval.env_args.config] foo = "bar" previously reached the environment as config={"foo": "bar"}, but now it is silently dropped before run_evaluation calls vf.load_environment, breaking non-v1 loaders or custom config fields that use the generic env_args path.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant