Add partner_blindness_prob (blind-agent robustness knob)#414
Open
eugenevinitsky wants to merge 1 commit intopuffer-4from
Open
Add partner_blindness_prob (blind-agent robustness knob)#414eugenevinitsky wants to merge 1 commit intopuffer-4from
eugenevinitsky wants to merge 1 commit intopuffer-4from
Conversation
Ports the blind-agent feature from vcha/turbostream. Per-episode probability that an agent sees zeroed partner observations for the whole episode, making it an unpredictable hazard for the rest of traffic. Blind agents are masked out of the PPO rollout buffer (GIGAFLOW Appendix B.4) so they don't pollute the gradient. Default 0.0 (off). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an opt-in “blind partner observations” robustness feature to the drive simulator, allowing a per-episode fraction of agents to have their partner (other-agent) observations zeroed while excluding their transitions from PPO rollouts to avoid gradient contamination.
Changes:
- Introduces
partner_blindness_probas a new[env]configuration knob (default0.0). - Wires the new kwarg through
ENV_FIELDSinto theDrivestruct. - Implements per-episode sampling of
Agent.is_blind_partner, zeros partner observations for blind agents, and setsmasks[i]=0for blind agents inc_step.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| config/drive.ini | Adds the new robustness configuration knob and documentation comments. |
| sim/env_fields.h | Exposes partner_blindness_prob via the centralized env-kwarg field list. |
| sim/datatypes.h | Adds an episode-level Agent.is_blind_partner flag. |
| sim/drive.h | Samples blindness per episode, skips writing partner observations for blind agents, and masks blind agents out of PPO rollouts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vcha/turbostream. Each agent has a per-episode probabilitypartner_blindness_probof being "blind" — its partner observations are zeroed for the whole episode, making it an unpredictable hazard for surrounding traffic.0.0(off), so behavior is unchanged unless you opt in.Changes
config/drive.ini: newpartner_blindness_prob = 0.0knob under a[Robustness features]block.sim/env_fields.h: wires the kwarg throughENV_FIELDS.sim/drive.h: addspartner_blindness_probto theDrivestruct; per-episode sampling inc_reset; early-return inwrite_partner_obsfor blind egos; mask out blind agents inc_step.sim/datatypes.h: addsis_blind_partnerflag toAgent.Test plan
./build.sh --fastbuilds clean./build.shbuilds clean (torch backend)partner_blindness_prob = 0.0: training metrics unchanged vs. basepartner_blindness_prob = 0.05: ~5% of agents per episode see zeroed partner obs and contribute mask=0 entries; checkmasksdistribution in a short rollout🤖 Generated with Claude Code