Skip to content

Commit 90e9cce

Browse files
committed
feat: Introduce new integrations API and convert anthropic
1 parent 8439874 commit 90e9cce

49 files changed

Lines changed: 1975 additions & 1060 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
---
2+
name: sdk-integrations
3+
description: Create or update a Braintrust Python SDK integration using the integrations API. Use when asked to add an integration, update an existing integration, add or update patchers, update auto_instrument, add integration tests, or work in py/src/braintrust/integrations/.
4+
---
5+
6+
# SDK Integrations
7+
8+
SDK integrations define how Braintrust discovers a provider, patches it safely, and keeps provider-specific tracing local to that integration. Read the existing integration closest to your task before writing a new one. If there is no closer example, `py/src/braintrust/integrations/anthropic/` is a useful reference implementation.
9+
10+
## Workflow
11+
12+
1. Read the shared integration primitives and the closest provider example.
13+
2. Choose the task shape: new provider, existing provider update, or `auto_instrument()` update.
14+
3. Implement the smallest integration, patcher, tracing, and export changes needed.
15+
4. Add or update VCR-backed integration tests and only re-record cassettes when behavior changed intentionally.
16+
5. Run the narrowest provider session first, then expand to shared validation only if the change touched shared code.
17+
18+
## Commands
19+
20+
```bash
21+
cd py && nox -s "test_<provider>(latest)"
22+
cd py && nox -s "test_<provider>(latest)" -- -k "test_name"
23+
cd py && nox -s "test_<provider>(latest)" -- --vcr-record=all -k "test_name"
24+
cd py && make test-core
25+
cd py && make lint
26+
```
27+
28+
## Creating or Updating an Integration
29+
30+
### 1. Read the nearest existing implementation
31+
32+
Always inspect these first:
33+
34+
- `py/src/braintrust/integrations/base.py`
35+
- `py/src/braintrust/integrations/runtime.py`
36+
- `py/src/braintrust/integrations/versioning.py`
37+
- `py/src/braintrust/integrations/config.py`
38+
39+
Relevant example implementation:
40+
41+
- `py/src/braintrust/integrations/anthropic/`
42+
43+
Read these additional files only when the task needs them:
44+
45+
- changing `auto_instrument()`: `py/src/braintrust/auto.py` and `py/src/braintrust/auto_test_scripts/test_auto_anthropic_patch_config.py`
46+
- adding or updating VCR tests: `py/src/braintrust/conftest.py` and `py/src/braintrust/integrations/anthropic/test_anthropic.py`
47+
48+
Then choose the path that matches the task:
49+
50+
- new provider: create `py/src/braintrust/integrations/<provider>/`
51+
- existing provider: read the provider package first and change only the affected patchers, tracing, tests, or exports
52+
- `auto_instrument()` only: keep the integration package unchanged unless the option shape or patcher surface also changed
53+
54+
### 2. Create or extend the integration module
55+
56+
For a new provider, create a package under `py/src/braintrust/integrations/<provider>/`.
57+
58+
For an existing provider, keep the module layout unless the current structure is actively causing problems.
59+
60+
Typical files:
61+
62+
- `__init__.py`: public exports for the integration type and any public helpers
63+
- `integration.py`: the `BaseIntegration` subclass, patcher registration, and high-level orchestration
64+
- `patchers.py`: one patcher per patch target, with version gating and existence checks close to the patch
65+
- `tracing.py`: provider-specific span creation, metadata extraction, stream handling, and output normalization
66+
- `test_<provider>.py`: integration tests for `wrap(...)`, `setup()`, sync/async behavior, streaming, and error handling
67+
- `cassettes/`: recorded provider traffic for VCR-backed integration tests when the provider uses HTTP
68+
69+
### 3. Define the integration class
70+
71+
Implement a `BaseIntegration` subclass in `integration.py`.
72+
73+
Set:
74+
75+
- `name`
76+
- `import_names`
77+
- `min_version` and `max_version` only when needed
78+
- `patchers`
79+
80+
Keep the class focused on orchestration. Provider-specific tracing logic should stay in `tracing.py`.
81+
82+
### 4. Add one patcher per coherent patch target
83+
84+
Put patchers in `patchers.py`.
85+
86+
Use `FunctionWrapperPatcher` when patching a single import path with `wrapt.wrap_function_wrapper`. Good examples:
87+
88+
- constructor patchers like `ProviderClient.__init__`
89+
- single API surfaces like `client.responses.create`
90+
- one sync and one async constructor patcher instead of one patcher doing both
91+
92+
Keep patchers narrow. If you need to patch multiple unrelated targets, create multiple patchers rather than one large patcher.
93+
94+
Patchers are responsible for:
95+
96+
- stable patcher ids via `name`
97+
- optional version gating
98+
- existence checks
99+
- idempotence through the base patcher marker
100+
101+
### 5. Keep tracing provider-local
102+
103+
Put span creation, metadata extraction, stream aggregation, error logging, and output normalization in `tracing.py`.
104+
105+
This layer should:
106+
107+
- preserve provider behavior
108+
- support sync, async, and streaming paths as needed
109+
- avoid raising from tracing-only code when that would break the provider call
110+
111+
If the provider has complex streaming internals, keep that logic local instead of forcing it into shared abstractions.
112+
113+
### 6. Wire public exports
114+
115+
Update public exports only as needed:
116+
117+
- `py/src/braintrust/integrations/__init__.py`
118+
- `py/src/braintrust/__init__.py`
119+
120+
### 7. Update auto_instrument only if this integration should be auto-patched
121+
122+
If the provider belongs in `braintrust.auto.auto_instrument()`, add a branch in `py/src/braintrust/auto.py`.
123+
124+
Match the current pattern:
125+
126+
- plain `bool` options for simple on/off integrations
127+
- `IntegrationPatchConfig` only when users need patcher-level selection
128+
129+
## Tests
130+
131+
Keep integration tests with the integration package.
132+
133+
Provider behavior tests should use `@pytest.mark.vcr` whenever the provider uses network calls. Avoid mocks and fakes.
134+
135+
Cover:
136+
137+
- direct `wrap(...)` behavior
138+
- `setup()` patching new clients
139+
- sync behavior
140+
- async behavior
141+
- streaming behavior
142+
- idempotence
143+
- failure/error logging
144+
- patcher selection if using `IntegrationPatchConfig`
145+
146+
Preferred locations:
147+
148+
- provider behavior tests: `py/src/braintrust/integrations/<provider>/test_<provider>.py`
149+
- version helper tests: `py/src/braintrust/integrations/test_versioning.py`
150+
- auto-instrument subprocess tests: `py/src/braintrust/auto_test_scripts/`
151+
152+
If the provider uses VCR, keep cassettes next to the integration test file under `py/src/braintrust/integrations/<provider>/cassettes/`.
153+
154+
Only re-record cassettes when the behavior change is intentional.
155+
156+
Use mocks or fakes only for cases that are hard to drive through recorded provider traffic, such as narrowly scoped error injection, local version-routing logic, or patcher existence checks.
157+
158+
## Patterns
159+
160+
### Constructor patching
161+
162+
If instrumenting future clients created by the SDK is the goal, patch constructors and attach traced surfaces after the real constructor runs. Anthropic is an example of this pattern.
163+
164+
### Patcher selection
165+
166+
Use `IntegrationPatchConfig` only when users benefit from enabling or disabling specific patchers. Validate unknown patcher ids through `BaseIntegration.resolve_patchers()` instead of silently ignoring them.
167+
168+
### Versioning
169+
170+
Prefer feature detection first and version checks second.
171+
172+
Use:
173+
174+
- `detect_module_version(...)`
175+
- `version_in_range(...)`
176+
- `version_matches_spec(...)`
177+
178+
Do not add `packaging` just for integration routing.
179+
180+
## Validation
181+
182+
- Run the narrowest provider session first.
183+
- Run `cd py && make test-core` if you changed shared integration code.
184+
- Run `cd py && make lint` before handing off broader integration changes.
185+
- If you changed `auto_instrument()`, run the relevant subprocess auto-instrument tests.
186+
187+
## Done When
188+
189+
- the provider package contains only the integration, patcher, tracing, export, and test changes required by the task
190+
- provider behavior tests use VCR unless recorded traffic cannot cover the behavior
191+
- cassette changes are present only when provider behavior changed intentionally
192+
- the narrowest affected provider session passes
193+
- `cd py && make test-core` has been run if shared integration code changed
194+
- `cd py && make lint` has been run before handoff
195+
196+
## Common Pitfalls
197+
198+
- Leaving provider behavior in `BaseIntegration` instead of the provider package.
199+
- Combining multiple unrelated patch targets into one patcher.
200+
- Forgetting async or streaming coverage.
201+
- Defaulting to mocks or fakes when the provider flow can be covered with VCR.
202+
- Moving tests but not moving their cassettes.
203+
- Adding patcher selection without tests for enabled and disabled cases.
204+
- Editing `auto_instrument()` in a way that implies a registry exists when it does not.

py/noxfile.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ def _pinned_python_version():
4040

4141
SRC_DIR = "braintrust"
4242
WRAPPER_DIR = "braintrust/wrappers"
43+
INTEGRATION_DIR = "braintrust/integrations"
44+
INTEGRATION_AUTO_TEST_DIR = "braintrust/integrations/auto_test_scripts"
45+
ANTHROPIC_INTEGRATION_DIR = "braintrust/integrations/anthropic"
4346
CONTRIB_DIR = "braintrust/contrib"
4447
DEVSERVER_DIR = "braintrust/devserver"
4548

@@ -176,6 +179,7 @@ def test_anthropic(session, version):
176179
_install_test_deps(session)
177180
_install(session, "anthropic", version)
178181
_run_tests(session, f"{WRAPPER_DIR}/test_anthropic.py")
182+
_run_tests(session, f"{INTEGRATION_DIR}/anthropic/test_anthropic.py")
179183
_run_core_tests(session)
180184

181185

@@ -400,7 +404,11 @@ def _get_braintrust_wheel():
400404

401405
def _run_core_tests(session):
402406
"""Run all tests which don't require optional dependencies."""
403-
_run_tests(session, SRC_DIR, ignore_paths=[WRAPPER_DIR, CONTRIB_DIR, DEVSERVER_DIR])
407+
_run_tests(
408+
session,
409+
SRC_DIR,
410+
ignore_paths=[WRAPPER_DIR, INTEGRATION_AUTO_TEST_DIR, ANTHROPIC_INTEGRATION_DIR, CONTRIB_DIR, DEVSERVER_DIR],
411+
)
404412

405413

406414
def _run_tests(session, test_path, ignore_path="", ignore_paths=None, env=None):

py/src/braintrust/__init__.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,13 +63,17 @@ def is_equal(expected, output):
6363

6464
from .audit import *
6565
from .auto import (
66+
IntegrationPatchConfig, # noqa: F401 # type: ignore[reportUnusedImport]
6667
auto_instrument, # noqa: F401 # type: ignore[reportUnusedImport]
6768
)
6869
from .framework import *
6970
from .framework2 import *
7071
from .functions.invoke import *
7172
from .functions.stream import *
7273
from .generated_types import *
74+
from .integrations.anthropic import (
75+
wrap_anthropic, # noqa: F401 # type: ignore[reportUnusedImport]
76+
)
7377
from .logger import *
7478
from .logger import (
7579
_internal_get_global_state, # noqa: F401 # type: ignore[reportUnusedImport]
@@ -89,9 +93,6 @@ def is_equal(expected, output):
8993
BT_IS_ASYNC_ATTRIBUTE, # noqa: F401 # type: ignore[reportUnusedImport]
9094
MarkAsyncWrapper, # noqa: F401 # type: ignore[reportUnusedImport]
9195
)
92-
from .wrappers.anthropic import (
93-
wrap_anthropic, # noqa: F401 # type: ignore[reportUnusedImport]
94-
)
9596
from .wrappers.litellm import (
9697
wrap_litellm, # noqa: F401 # type: ignore[reportUnusedImport]
9798
)

py/src/braintrust/auto.py

Lines changed: 50 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,13 @@
99
import logging
1010
from contextlib import contextmanager
1111

12+
from braintrust.integrations import AnthropicIntegration, IntegrationPatchConfig
13+
1214

1315
__all__ = ["auto_instrument"]
1416

1517
logger = logging.getLogger(__name__)
18+
InstrumentOption = bool | IntegrationPatchConfig
1619

1720

1821
@contextmanager
@@ -29,7 +32,7 @@ def _try_patch():
2932
def auto_instrument(
3033
*,
3134
openai: bool = True,
32-
anthropic: bool = True,
35+
anthropic: InstrumentOption = True,
3336
litellm: bool = True,
3437
pydantic_ai: bool = True,
3538
google_genai: bool = True,
@@ -49,7 +52,8 @@ def auto_instrument(
4952
5053
Args:
5154
openai: Enable OpenAI instrumentation (default: True)
52-
anthropic: Enable Anthropic instrumentation (default: True)
55+
anthropic: Enable Anthropic instrumentation (default: True), or pass an
56+
IntegrationPatchConfig to select Anthropic patchers explicitly.
5357
litellm: Enable LiteLLM instrumentation (default: True)
5458
pydantic_ai: Enable Pydantic AI instrumentation (default: True)
5559
google_genai: Enable Google GenAI instrumentation (default: True)
@@ -104,23 +108,33 @@ def auto_instrument(
104108
"""
105109
results = {}
106110

107-
if openai:
111+
openai_enabled = _normalize_bool_option("openai", openai)
112+
anthropic_enabled, anthropic_config = _normalize_anthropic_option(anthropic)
113+
litellm_enabled = _normalize_bool_option("litellm", litellm)
114+
pydantic_ai_enabled = _normalize_bool_option("pydantic_ai", pydantic_ai)
115+
google_genai_enabled = _normalize_bool_option("google_genai", google_genai)
116+
agno_enabled = _normalize_bool_option("agno", agno)
117+
claude_agent_sdk_enabled = _normalize_bool_option("claude_agent_sdk", claude_agent_sdk)
118+
dspy_enabled = _normalize_bool_option("dspy", dspy)
119+
adk_enabled = _normalize_bool_option("adk", adk)
120+
121+
if openai_enabled:
108122
results["openai"] = _instrument_openai()
109-
if anthropic:
110-
results["anthropic"] = _instrument_anthropic()
111-
if litellm:
123+
if anthropic_enabled:
124+
results["anthropic"] = _instrument_integration(AnthropicIntegration, patch_config=anthropic_config)
125+
if litellm_enabled:
112126
results["litellm"] = _instrument_litellm()
113-
if pydantic_ai:
127+
if pydantic_ai_enabled:
114128
results["pydantic_ai"] = _instrument_pydantic_ai()
115-
if google_genai:
129+
if google_genai_enabled:
116130
results["google_genai"] = _instrument_google_genai()
117-
if agno:
131+
if agno_enabled:
118132
results["agno"] = _instrument_agno()
119-
if claude_agent_sdk:
133+
if claude_agent_sdk_enabled:
120134
results["claude_agent_sdk"] = _instrument_claude_agent_sdk()
121-
if dspy:
135+
if dspy_enabled:
122136
results["dspy"] = _instrument_dspy()
123-
if adk:
137+
if adk_enabled:
124138
results["adk"] = _instrument_adk()
125139

126140
return results
@@ -134,14 +148,34 @@ def _instrument_openai() -> bool:
134148
return False
135149

136150

137-
def _instrument_anthropic() -> bool:
151+
def _instrument_integration(integration, *, patch_config: IntegrationPatchConfig | None = None) -> bool:
138152
with _try_patch():
139-
from braintrust.wrappers.anthropic import patch_anthropic
140-
141-
return patch_anthropic()
153+
return integration.setup(
154+
enabled_patchers=patch_config.enabled_patchers if patch_config is not None else None,
155+
disabled_patchers=patch_config.disabled_patchers if patch_config is not None else None,
156+
)
142157
return False
143158

144159

160+
def _normalize_bool_option(name: str, option: bool) -> bool:
161+
if isinstance(option, bool):
162+
return option
163+
164+
raise TypeError(f"auto_instrument option {name!r} must be a bool, got {type(option).__name__}")
165+
166+
167+
def _normalize_anthropic_option(option: InstrumentOption) -> tuple[bool, IntegrationPatchConfig | None]:
168+
if isinstance(option, bool):
169+
return option, None
170+
171+
if isinstance(option, IntegrationPatchConfig):
172+
return True, option
173+
174+
raise TypeError(
175+
f"auto_instrument option 'anthropic' must be a bool or IntegrationPatchConfig, got {type(option).__name__}"
176+
)
177+
178+
145179
def _instrument_litellm() -> bool:
146180
with _try_patch():
147181
from braintrust.wrappers.litellm import patch_litellm
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
from .anthropic import AnthropicIntegration
2+
from .config import IntegrationPatchConfig
3+
4+
5+
__all__ = ["AnthropicIntegration", "IntegrationPatchConfig"]

0 commit comments

Comments
 (0)