Add monkey-patching integration for Braintrust tracing by cdreetz · Pull Request #1326 · PrimeIntellect-ai/verifiers

cdreetz · 2026-05-09T23:55:55Z

Summary

Adds a setup_verifiers_tracing() function that monkey-patches the core Environment, MultiTurnEnv, and ToolEnv classes at runtime to inject Braintrust tracing. This is an alternative to the drop-in replacement classes from #1325 — users just call one function at the top of their environment file and all subclasses automatically produce traces.

This PR includes #1325's verifiers/envs/experimental/braintrust_tracing/ content (drop-in classes + braintrust_tracing.py helpers) plus the monkey-patching layer on top. Opening as an independent PR (base = main) so the team can compare both approaches side-by-side.

Usage:

from verifiers.envs.experimental.braintrust_tracing.integration import setup_verifiers_tracing
setup_verifiers_tracing()

What gets patched:

Environment.get_model_response — wrapped to add model_request spans (calls original)
Environment._run_rollout_state — replaced to add rollout + scoring spans
Environment._run_group_states — replaced to add group + per-rollout child spans
MultiTurnEnv.rollout — replaced to add setup/turn/timeout spans
ToolEnv.call_tool — wrapped to add tool_call spans (calls original)
ToolEnv.env_response — replaced to pass state= to call_tool for span context

Patching is idempotent (guarded by __verifiers_bt_patched__ class marker). Zero overhead when BRAINTRUST_API_KEY is not set.

Updates since last revision

Fixed critical MRO inheritance bug in _is_patched: Changed from getattr(cls, _PATCH_MARKER, False) to cls.__dict__.get(_PATCH_MARKER, False). The previous implementation followed Python's MRO, so after patching Environment, the inherited marker caused _patch_multiturn_env() and _patch_tool_env() to silently skip — meaning turn-level, tool-call, and setup spans were never installed.
Removed dead code: Removed unused _orig_run_rollout_state, _orig_run_group_states, and _orig_env_response captures that were saved but never called.

Review & Testing Checklist

_run_rollout_state, _run_group_states, MultiTurnEnv.rollout, and ToolEnv.env_response are full method replacements, not wrappers. Any future changes to the core methods won't be picked up by the patched versions. Verify the reimplemented logic matches the current core methods exactly (e.g. compare _traced_run_rollout_state against Environment._run_rollout_state in environment.py).
No automated tests for the integration. Consider running an actual eval with setup_verifiers_tracing() instead of the drop-in classes and verifying traces appear in Braintrust with the expected span hierarchy.
_bt._INSTANCE = None directly mutates a private module-level singleton to force re-init when api_key/project are passed. Verify this doesn't race with concurrent access.
Recommended test plan: In mini-browse-env, replace the experimental import (from verifiers.envs.experimental.braintrust_tracing.stateful_tool_env import StatefulToolEnv) with setup_verifiers_tracing() + the normal from verifiers.envs.stateful_tool_env import StatefulToolEnv, run an eval, and check Braintrust for correct nested spans (rollout → turn → model_request / tool_call).

Notes

Builds on Add Braintrust tracing as experimental drop-in environment variants #1325 (drop-in experimental classes). This PR adds the monkey-patching layer as a second access path; both can coexist.
The MRO bug fix was critical — without it, only Environment-level patches (rollout/group spans, model_request wrapping) would have been applied, while MultiTurnEnv and ToolEnv patches (turn spans, tool_call spans, setup spans) would have been silently skipped.

Note

Medium Risk
Introduces runtime monkey-patching of core environment methods when enabled, which can subtly change rollout/tool-call behavior and may drift from upstream logic; impact is limited to users who opt in via setup_verifiers_tracing() and configure BRAINTRUST_API_KEY.

Overview
Adds an experimental braintrust_tracing package that records nested Braintrust spans for generate() runs, rollouts, turns, model requests, tool calls, and scoring, with safe no-op behavior when BRAINTRUST_API_KEY is unset.

Provides setup_verifiers_tracing() to monkey-patch the core Environment, MultiTurnEnv, and ToolEnv methods at runtime so existing user environments can emit traces without changing imports, plus optional dependency wiring via a new braintrust extra (and updated uv.lock/type-check overrides).

^{Reviewed by Cursor Bugbot for commit 9f219a7. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds setup_verifiers_tracing() function that patches core Environment, MultiTurnEnv, and ToolEnv classes at runtime to inject Braintrust tracing. Usage: from verifiers.envs.experimental.braintrust_tracing.integration import setup_verifiers_tracing setup_verifiers_tracing() This is an alternative to importing the experimental drop-in replacement classes. The patching is idempotent and zero-overhead when BRAINTRUST_API_KEY is not set. Co-Authored-By: Christian Reetz <cdreetz@gmail.com>

Use cls.__dict__.get() instead of getattr() to check patch markers, preventing subclass inheritance from skipping MultiTurnEnv and ToolEnv patches. Also remove unused _orig_run_rollout_state, _orig_run_group_states, and _orig_env_response references. Co-Authored-By: Christian Reetz <cdreetz@gmail.com>

devin-ai-integration Bot and others added 2 commits May 9, 2026 23:01

cdreetz changed the base branch from main to cdreetz/braintrust-tracing May 9, 2026 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add monkey-patching integration for Braintrust tracing#1326

Add monkey-patching integration for Braintrust tracing#1326
cdreetz wants to merge 2 commits intocdreetz/braintrust-tracingfrom
cdreetz/patched-tracing

cdreetz commented May 9, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cdreetz commented May 9, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Updates since last revision

Review & Testing Checklist

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cdreetz commented May 9, 2026 •

edited by cursor Bot

Loading