Add monkey-patching integration for Braintrust tracing#1326
Open
cdreetz wants to merge 2 commits intocdreetz/braintrust-tracingfrom
Open
Add monkey-patching integration for Braintrust tracing#1326cdreetz wants to merge 2 commits intocdreetz/braintrust-tracingfrom
cdreetz wants to merge 2 commits intocdreetz/braintrust-tracingfrom
Conversation
Adds setup_verifiers_tracing() function that patches core Environment,
MultiTurnEnv, and ToolEnv classes at runtime to inject Braintrust tracing.
Usage:
from verifiers.envs.experimental.braintrust_tracing.integration import setup_verifiers_tracing
setup_verifiers_tracing()
This is an alternative to importing the experimental drop-in replacement
classes. The patching is idempotent and zero-overhead when BRAINTRUST_API_KEY
is not set.
Co-Authored-By: Christian Reetz <cdreetz@gmail.com>
Use cls.__dict__.get() instead of getattr() to check patch markers, preventing subclass inheritance from skipping MultiTurnEnv and ToolEnv patches. Also remove unused _orig_run_rollout_state, _orig_run_group_states, and _orig_env_response references. Co-Authored-By: Christian Reetz <cdreetz@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
setup_verifiers_tracing()function that monkey-patches the coreEnvironment,MultiTurnEnv, andToolEnvclasses at runtime to inject Braintrust tracing. This is an alternative to the drop-in replacement classes from #1325 — users just call one function at the top of their environment file and all subclasses automatically produce traces.This PR includes #1325's
verifiers/envs/experimental/braintrust_tracing/content (drop-in classes +braintrust_tracing.pyhelpers) plus the monkey-patching layer on top. Opening as an independent PR (base =main) so the team can compare both approaches side-by-side.Usage:
What gets patched:
Environment.get_model_response— wrapped to add model_request spans (calls original)Environment._run_rollout_state— replaced to add rollout + scoring spansEnvironment._run_group_states— replaced to add group + per-rollout child spansMultiTurnEnv.rollout— replaced to add setup/turn/timeout spansToolEnv.call_tool— wrapped to add tool_call spans (calls original)ToolEnv.env_response— replaced to passstate=tocall_toolfor span contextPatching is idempotent (guarded by
__verifiers_bt_patched__class marker). Zero overhead whenBRAINTRUST_API_KEYis not set.Updates since last revision
_is_patched: Changed fromgetattr(cls, _PATCH_MARKER, False)tocls.__dict__.get(_PATCH_MARKER, False). The previous implementation followed Python's MRO, so after patchingEnvironment, the inherited marker caused_patch_multiturn_env()and_patch_tool_env()to silently skip — meaning turn-level, tool-call, and setup spans were never installed._orig_run_rollout_state,_orig_run_group_states, and_orig_env_responsecaptures that were saved but never called.Review & Testing Checklist
_run_rollout_state,_run_group_states,MultiTurnEnv.rollout, andToolEnv.env_responseare full method replacements, not wrappers. Any future changes to the core methods won't be picked up by the patched versions. Verify the reimplemented logic matches the current core methods exactly (e.g. compare_traced_run_rollout_stateagainstEnvironment._run_rollout_stateinenvironment.py).setup_verifiers_tracing()instead of the drop-in classes and verifying traces appear in Braintrust with the expected span hierarchy._bt._INSTANCE = Nonedirectly mutates a private module-level singleton to force re-init whenapi_key/projectare passed. Verify this doesn't race with concurrent access.from verifiers.envs.experimental.braintrust_tracing.stateful_tool_env import StatefulToolEnv) withsetup_verifiers_tracing()+ the normalfrom verifiers.envs.stateful_tool_env import StatefulToolEnv, run an eval, and check Braintrust for correct nested spans (rollout → turn → model_request / tool_call).Notes
Environment-level patches (rollout/group spans, model_request wrapping) would have been applied, whileMultiTurnEnvandToolEnvpatches (turn spans, tool_call spans, setup spans) would have been silently skipped.Note
Medium Risk
Introduces runtime monkey-patching of core environment methods when enabled, which can subtly change rollout/tool-call behavior and may drift from upstream logic; impact is limited to users who opt in via
setup_verifiers_tracing()and configureBRAINTRUST_API_KEY.Overview
Adds an experimental
braintrust_tracingpackage that records nested Braintrust spans forgenerate()runs, rollouts, turns, model requests, tool calls, and scoring, with safe no-op behavior whenBRAINTRUST_API_KEYis unset.Provides
setup_verifiers_tracing()to monkey-patch the coreEnvironment,MultiTurnEnv, andToolEnvmethods at runtime so existing user environments can emit traces without changing imports, plus optional dependency wiring via a newbraintrustextra (and updateduv.lock/type-check overrides).Reviewed by Cursor Bugbot for commit 9f219a7. Bugbot is set up for automated code reviews on this repo. Configure here.