Skip to content

feat: add --task-dir support for milestone-based rewards in standalone GRPO trainer#60

Merged
abrichr merged 2 commits intomainfrom
feat/task-dir-standalone-trainer
Mar 22, 2026
Merged

feat: add --task-dir support for milestone-based rewards in standalone GRPO trainer#60
abrichr merged 2 commits intomainfrom
feat/task-dir-standalone-trainer

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 22, 2026

Loads TaskConfig YAMLs, computes milestone rewards client-side via VLM screenshot judge. No evaluate endpoint needed.

abrichr and others added 2 commits March 21, 2026 23:12
Qwen2.5-VL requires <|image_pad|> tokens in the input. These are
inserted by apply_chat_template only when messages include
{"type": "image"} content blocks.

Fixed both agent_fn and _compute_rollout_loss.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e GRPO trainer

- GRPOConfig: add task_dir field
- reward.py: evaluate_milestones_screenshot() for client-side reward
- trainer.py: load TaskConfigs, auto-populate task_ids, override rewards
- rollout_collector.py: pass task_configs to env
- No WAA evaluate endpoint needed — rewards computed via VLM judge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit 7d095da into main Mar 22, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant