Skip to content

Add monkey-patching integration for Braintrust tracing#1326

Open
cdreetz wants to merge 2 commits intocdreetz/braintrust-tracingfrom
cdreetz/patched-tracing
Open

Add monkey-patching integration for Braintrust tracing#1326
cdreetz wants to merge 2 commits intocdreetz/braintrust-tracingfrom
cdreetz/patched-tracing

Conversation

@cdreetz
Copy link
Copy Markdown
Collaborator

@cdreetz cdreetz commented May 9, 2026

Summary

Adds a setup_verifiers_tracing() function that monkey-patches the core Environment, MultiTurnEnv, and ToolEnv classes at runtime to inject Braintrust tracing. This is an alternative to the drop-in replacement classes from #1325 — users just call one function at the top of their environment file and all subclasses automatically produce traces.

This PR includes #1325's verifiers/envs/experimental/braintrust_tracing/ content (drop-in classes + braintrust_tracing.py helpers) plus the monkey-patching layer on top. Opening as an independent PR (base = main) so the team can compare both approaches side-by-side.

Usage:

from verifiers.envs.experimental.braintrust_tracing.integration import setup_verifiers_tracing
setup_verifiers_tracing()

What gets patched:

  • Environment.get_model_response — wrapped to add model_request spans (calls original)
  • Environment._run_rollout_state — replaced to add rollout + scoring spans
  • Environment._run_group_states — replaced to add group + per-rollout child spans
  • MultiTurnEnv.rollout — replaced to add setup/turn/timeout spans
  • ToolEnv.call_tool — wrapped to add tool_call spans (calls original)
  • ToolEnv.env_response — replaced to pass state= to call_tool for span context

Patching is idempotent (guarded by __verifiers_bt_patched__ class marker). Zero overhead when BRAINTRUST_API_KEY is not set.

Updates since last revision

  • Fixed critical MRO inheritance bug in _is_patched: Changed from getattr(cls, _PATCH_MARKER, False) to cls.__dict__.get(_PATCH_MARKER, False). The previous implementation followed Python's MRO, so after patching Environment, the inherited marker caused _patch_multiturn_env() and _patch_tool_env() to silently skip — meaning turn-level, tool-call, and setup spans were never installed.
  • Removed dead code: Removed unused _orig_run_rollout_state, _orig_run_group_states, and _orig_env_response captures that were saved but never called.

Review & Testing Checklist

  • _run_rollout_state, _run_group_states, MultiTurnEnv.rollout, and ToolEnv.env_response are full method replacements, not wrappers. Any future changes to the core methods won't be picked up by the patched versions. Verify the reimplemented logic matches the current core methods exactly (e.g. compare _traced_run_rollout_state against Environment._run_rollout_state in environment.py).
  • No automated tests for the integration. Consider running an actual eval with setup_verifiers_tracing() instead of the drop-in classes and verifying traces appear in Braintrust with the expected span hierarchy.
  • _bt._INSTANCE = None directly mutates a private module-level singleton to force re-init when api_key/project are passed. Verify this doesn't race with concurrent access.
  • Recommended test plan: In mini-browse-env, replace the experimental import (from verifiers.envs.experimental.braintrust_tracing.stateful_tool_env import StatefulToolEnv) with setup_verifiers_tracing() + the normal from verifiers.envs.stateful_tool_env import StatefulToolEnv, run an eval, and check Braintrust for correct nested spans (rollout → turn → model_request / tool_call).

Notes

  • Builds on Add Braintrust tracing as experimental drop-in environment variants #1325 (drop-in experimental classes). This PR adds the monkey-patching layer as a second access path; both can coexist.
  • The MRO bug fix was critical — without it, only Environment-level patches (rollout/group spans, model_request wrapping) would have been applied, while MultiTurnEnv and ToolEnv patches (turn spans, tool_call spans, setup spans) would have been silently skipped.

Note

Medium Risk
Introduces runtime monkey-patching of core environment methods when enabled, which can subtly change rollout/tool-call behavior and may drift from upstream logic; impact is limited to users who opt in via setup_verifiers_tracing() and configure BRAINTRUST_API_KEY.

Overview
Adds an experimental braintrust_tracing package that records nested Braintrust spans for generate() runs, rollouts, turns, model requests, tool calls, and scoring, with safe no-op behavior when BRAINTRUST_API_KEY is unset.

Provides setup_verifiers_tracing() to monkey-patch the core Environment, MultiTurnEnv, and ToolEnv methods at runtime so existing user environments can emit traces without changing imports, plus optional dependency wiring via a new braintrust extra (and updated uv.lock/type-check overrides).

Reviewed by Cursor Bugbot for commit 9f219a7. Bugbot is set up for automated code reviews on this repo. Configure here.

devin-ai-integration Bot and others added 2 commits May 9, 2026 23:01
Adds setup_verifiers_tracing() function that patches core Environment,
MultiTurnEnv, and ToolEnv classes at runtime to inject Braintrust tracing.

Usage:
    from verifiers.envs.experimental.braintrust_tracing.integration import setup_verifiers_tracing
    setup_verifiers_tracing()

This is an alternative to importing the experimental drop-in replacement
classes. The patching is idempotent and zero-overhead when BRAINTRUST_API_KEY
is not set.

Co-Authored-By: Christian Reetz <cdreetz@gmail.com>
Use cls.__dict__.get() instead of getattr() to check patch markers,
preventing subclass inheritance from skipping MultiTurnEnv and ToolEnv
patches. Also remove unused _orig_run_rollout_state, _orig_run_group_states,
and _orig_env_response references.

Co-Authored-By: Christian Reetz <cdreetz@gmail.com>
@cdreetz cdreetz changed the base branch from main to cdreetz/braintrust-tracing May 9, 2026 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant