braintrustdata
diff --git a/‎.agents/skills/sdk-integrations/SKILL.md‎
Lines changed: 204 additions & 0 deletions b/‎.agents/skills/sdk-integrations/SKILL.md‎
Lines changed: 204 additions & 0 deletions
diff --git a/‎py/noxfile.py‎
Lines changed: 9 additions & 1 deletion b/‎py/noxfile.py‎
Lines changed: 9 additions & 1 deletion
diff --git a/‎py/src/braintrust/__init__.py‎
Lines changed: 4 additions & 3 deletions b/‎py/src/braintrust/__init__.py‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎py/src/braintrust/auto.py‎
Lines changed: 50 additions & 16 deletions b/‎py/src/braintrust/auto.py‎
Lines changed: 50 additions & 16 deletions
diff --git a/‎py/src/braintrust/integrations/__init__.py‎
Lines changed: 5 additions & 0 deletions b/‎py/src/braintrust/integrations/__init__.py‎
Lines changed: 5 additions & 0 deletions
@@ -0,0 +1,204 @@
+---
+name: sdk-integrations
+description: Create or update a Braintrust Python SDK integration using the integrations API. Use when asked to add an integration, update an existing integration, add or update patchers, update auto_instrument, add integration tests, or work in py/src/braintrust/integrations/.
+---
+
+# SDK Integrations
+
+SDK integrations define how Braintrust discovers a provider, patches it safely, and keeps provider-specific tracing local to that integration. Read the existing integration closest to your task before writing a new one. If there is no closer example, `py/src/braintrust/integrations/anthropic/` is a useful reference implementation.
+
+## Workflow
+
+1. Read the shared integration primitives and the closest provider example.
+2. Choose the task shape: new provider, existing provider update, or `auto_instrument()` update.
+3. Implement the smallest integration, patcher, tracing, and export changes needed.
+4. Add or update VCR-backed integration tests and only re-record cassettes when behavior changed intentionally.
+5. Run the narrowest provider session first, then expand to shared validation only if the change touched shared code.
+
+## Commands
+
+```bash
+cd py && nox -s "test_<provider>(latest)"
+cd py && nox -s "test_<provider>(latest)" -- -k "test_name"
+cd py && nox -s "test_<provider>(latest)" -- --vcr-record=all -k "test_name"
+cd py && make test-core
+cd py && make lint
+```
+
+## Creating or Updating an Integration
+
+### 1. Read the nearest existing implementation
+
+Always inspect these first:
+
+- `py/src/braintrust/integrations/base.py`
+- `py/src/braintrust/integrations/runtime.py`
+- `py/src/braintrust/integrations/versioning.py`
+- `py/src/braintrust/integrations/config.py`
+
+Relevant example implementation:
+
+- `py/src/braintrust/integrations/anthropic/`
+
+Read these additional files only when the task needs them:
+
+- changing `auto_instrument()`: `py/src/braintrust/auto.py` and `py/src/braintrust/auto_test_scripts/test_auto_anthropic_patch_config.py`
+- adding or updating VCR tests: `py/src/braintrust/conftest.py` and `py/src/braintrust/integrations/anthropic/test_anthropic.py`
+
+Then choose the path that matches the task:
+
+- new provider: create `py/src/braintrust/integrations/<provider>/`
+- existing provider: read the provider package first and change only the affected patchers, tracing, tests, or exports
+- `auto_instrument()` only: keep the integration package unchanged unless the option shape or patcher surface also changed
+
+### 2. Create or extend the integration module
+
+For a new provider, create a package under `py/src/braintrust/integrations/<provider>/`.
+
+For an existing provider, keep the module layout unless the current structure is actively causing problems.
+
+Typical files:
+
+- `__init__.py`: public exports for the integration type and any public helpers
+- `integration.py`: the `BaseIntegration` subclass, patcher registration, and high-level orchestration
+- `patchers.py`: one patcher per patch target, with version gating and existence checks close to the patch
+- `tracing.py`: provider-specific span creation, metadata extraction, stream handling, and output normalization
+- `test_<provider>.py`: integration tests for `wrap(...)`, `setup()`, sync/async behavior, streaming, and error handling
+- `cassettes/`: recorded provider traffic for VCR-backed integration tests when the provider uses HTTP
+
+### 3. Define the integration class
+
+Implement a `BaseIntegration` subclass in `integration.py`.
+
+Set:
+
+- `name`
+- `import_names`
+- `min_version` and `max_version` only when needed
+- `patchers`
+
+Keep the class focused on orchestration. Provider-specific tracing logic should stay in `tracing.py`.
+
+### 4. Add one patcher per coherent patch target
+
+Put patchers in `patchers.py`.
+
+Use `FunctionWrapperPatcher` when patching a single import path with `wrapt.wrap_function_wrapper`. Good examples:
+
+- constructor patchers like `ProviderClient.__init__`
+- single API surfaces like `client.responses.create`
+- one sync and one async constructor patcher instead of one patcher doing both
+
+Keep patchers narrow. If you need to patch multiple unrelated targets, create multiple patchers rather than one large patcher.
+
+Patchers are responsible for:
+
+- stable patcher ids via `name`
+- optional version gating
+- existence checks
+- idempotence through the base patcher marker
+
+### 5. Keep tracing provider-local
+
+Put span creation, metadata extraction, stream aggregation, error logging, and output normalization in `tracing.py`.
+
+This layer should:
+
+- preserve provider behavior
+- support sync, async, and streaming paths as needed
+- avoid raising from tracing-only code when that would break the provider call
+
+If the provider has complex streaming internals, keep that logic local instead of forcing it into shared abstractions.
+
+### 6. Wire public exports
+
+Update public exports only as needed:
+
+- `py/src/braintrust/integrations/__init__.py`
+- `py/src/braintrust/__init__.py`
+
+### 7. Update auto_instrument only if this integration should be auto-patched
+
+If the provider belongs in `braintrust.auto.auto_instrument()`, add a branch in `py/src/braintrust/auto.py`.
+
+Match the current pattern:
+
+- plain `bool` options for simple on/off integrations
+- `IntegrationPatchConfig` only when users need patcher-level selection
+
+## Tests
+
+Keep integration tests with the integration package.
+
+Provider behavior tests should use `@pytest.mark.vcr` whenever the provider uses network calls. Avoid mocks and fakes.
+
+Cover:
+
+- direct `wrap(...)` behavior
+- `setup()` patching new clients
+- sync behavior
+- async behavior
+- streaming behavior
+- idempotence
+- failure/error logging
+- patcher selection if using `IntegrationPatchConfig`
+
+Preferred locations:
+
+- provider behavior tests: `py/src/braintrust/integrations/<provider>/test_<provider>.py`
+- version helper tests: `py/src/braintrust/integrations/test_versioning.py`
+- auto-instrument subprocess tests: `py/src/braintrust/auto_test_scripts/`
+
+If the provider uses VCR, keep cassettes next to the integration test file under `py/src/braintrust/integrations/<provider>/cassettes/`.
+
+Only re-record cassettes when the behavior change is intentional.
+
+Use mocks or fakes only for cases that are hard to drive through recorded provider traffic, such as narrowly scoped error injection, local version-routing logic, or patcher existence checks.
+
+## Patterns
+
+### Constructor patching
+
+If instrumenting future clients created by the SDK is the goal, patch constructors and attach traced surfaces after the real constructor runs. Anthropic is an example of this pattern.
+
+### Patcher selection
+
+Use `IntegrationPatchConfig` only when users benefit from enabling or disabling specific patchers. Validate unknown patcher ids through `BaseIntegration.resolve_patchers()` instead of silently ignoring them.
+
+### Versioning
+
+Prefer feature detection first and version checks second.
+
+Use:
+
+- `detect_module_version(...)`
+- `version_in_range(...)`
+- `version_matches_spec(...)`
+
+Do not add `packaging` just for integration routing.
+
+## Validation
+
+- Run the narrowest provider session first.
+- Run `cd py && make test-core` if you changed shared integration code.
+- Run `cd py && make lint` before handing off broader integration changes.
+- If you changed `auto_instrument()`, run the relevant subprocess auto-instrument tests.
+
+## Done When
+
+- the provider package contains only the integration, patcher, tracing, export, and test changes required by the task
+- provider behavior tests use VCR unless recorded traffic cannot cover the behavior
+- cassette changes are present only when provider behavior changed intentionally
+- the narrowest affected provider session passes
+- `cd py && make test-core` has been run if shared integration code changed
+- `cd py && make lint` has been run before handoff
+
+## Common Pitfalls
+
+- Leaving provider behavior in `BaseIntegration` instead of the provider package.
+- Combining multiple unrelated patch targets into one patcher.
+- Forgetting async or streaming coverage.
+- Defaulting to mocks or fakes when the provider flow can be covered with VCR.
+- Moving tests but not moving their cassettes.
+- Adding patcher selection without tests for enabled and disabled cases.
+- Editing `auto_instrument()` in a way that implies a registry exists when it does not.
@@ -40,6 +40,9 @@ def _pinned_python_version():
 
 SRC_DIR = "braintrust"
 WRAPPER_DIR = "braintrust/wrappers"
+INTEGRATION_DIR = "braintrust/integrations"
+INTEGRATION_AUTO_TEST_DIR = "braintrust/integrations/auto_test_scripts"
+ANTHROPIC_INTEGRATION_DIR = "braintrust/integrations/anthropic"
 CONTRIB_DIR = "braintrust/contrib"
 DEVSERVER_DIR = "braintrust/devserver"
 
@@ -176,6 +179,7 @@ def test_anthropic(session, version):
     _install_test_deps(session)
     _install(session, "anthropic", version)
     _run_tests(session, f"{WRAPPER_DIR}/test_anthropic.py")
+    _run_tests(session, f"{INTEGRATION_DIR}/anthropic/test_anthropic.py")
     _run_core_tests(session)
 
 
@@ -400,7 +404,11 @@ def _get_braintrust_wheel():
 
 def _run_core_tests(session):
     """Run all tests which don't require optional dependencies."""
-    _run_tests(session, SRC_DIR, ignore_paths=[WRAPPER_DIR, CONTRIB_DIR, DEVSERVER_DIR])
+    _run_tests(
+        session,
+        SRC_DIR,
+        ignore_paths=[WRAPPER_DIR, INTEGRATION_AUTO_TEST_DIR, ANTHROPIC_INTEGRATION_DIR, CONTRIB_DIR, DEVSERVER_DIR],
+    )
 
 
 def _run_tests(session, test_path, ignore_path="", ignore_paths=None, env=None):
 
@@ -63,13 +63,17 @@ def is_equal(expected, output):
 
 from .audit import *
 from .auto import (
+    IntegrationPatchConfig,  # noqa: F401 # type: ignore[reportUnusedImport]
     auto_instrument,  # noqa: F401 # type: ignore[reportUnusedImport]
 )
 from .framework import *
 from .framework2 import *
 from .functions.invoke import *
 from .functions.stream import *
 from .generated_types import *
+from .integrations.anthropic import (
+    wrap_anthropic,  # noqa: F401 # type: ignore[reportUnusedImport]
+)
 from .logger import *
 from .logger import (
     _internal_get_global_state,  # noqa: F401 # type: ignore[reportUnusedImport]
@@ -89,9 +93,6 @@ def is_equal(expected, output):
     BT_IS_ASYNC_ATTRIBUTE,  # noqa: F401 # type: ignore[reportUnusedImport]
     MarkAsyncWrapper,  # noqa: F401 # type: ignore[reportUnusedImport]
 )
-from .wrappers.anthropic import (
-    wrap_anthropic,  # noqa: F401 # type: ignore[reportUnusedImport]
-)
 from .wrappers.litellm import (
     wrap_litellm,  # noqa: F401 # type: ignore[reportUnusedImport]
 )
 
@@ -9,10 +9,13 @@
 import logging
 from contextlib import contextmanager
 
+from braintrust.integrations import AnthropicIntegration, IntegrationPatchConfig
+
 
 __all__ = ["auto_instrument"]
 
 logger = logging.getLogger(__name__)
+InstrumentOption = bool | IntegrationPatchConfig
 
 
 @contextmanager
@@ -29,7 +32,7 @@ def _try_patch():
 def auto_instrument(
     *,
     openai: bool = True,
-    anthropic: bool = True,
+    anthropic: InstrumentOption = True,
     litellm: bool = True,
     pydantic_ai: bool = True,
     google_genai: bool = True,
@@ -49,7 +52,8 @@ def auto_instrument(
 
     Args:
         openai: Enable OpenAI instrumentation (default: True)
-        anthropic: Enable Anthropic instrumentation (default: True)
+        anthropic: Enable Anthropic instrumentation (default: True), or pass an
+            IntegrationPatchConfig to select Anthropic patchers explicitly.
         litellm: Enable LiteLLM instrumentation (default: True)
         pydantic_ai: Enable Pydantic AI instrumentation (default: True)
         google_genai: Enable Google GenAI instrumentation (default: True)
@@ -104,23 +108,33 @@ def auto_instrument(
     """
     results = {}
 
-    if openai:
+    openai_enabled = _normalize_bool_option("openai", openai)
+    anthropic_enabled, anthropic_config = _normalize_anthropic_option(anthropic)
+    litellm_enabled = _normalize_bool_option("litellm", litellm)
+    pydantic_ai_enabled = _normalize_bool_option("pydantic_ai", pydantic_ai)
+    google_genai_enabled = _normalize_bool_option("google_genai", google_genai)
+    agno_enabled = _normalize_bool_option("agno", agno)
+    claude_agent_sdk_enabled = _normalize_bool_option("claude_agent_sdk", claude_agent_sdk)
+    dspy_enabled = _normalize_bool_option("dspy", dspy)
+    adk_enabled = _normalize_bool_option("adk", adk)
+
+    if openai_enabled:
         results["openai"] = _instrument_openai()
-    if anthropic:
-        results["anthropic"] = _instrument_anthropic()
-    if litellm:
+    if anthropic_enabled:
+        results["anthropic"] = _instrument_integration(AnthropicIntegration, patch_config=anthropic_config)
+    if litellm_enabled:
         results["litellm"] = _instrument_litellm()
-    if pydantic_ai:
+    if pydantic_ai_enabled:
         results["pydantic_ai"] = _instrument_pydantic_ai()
-    if google_genai:
+    if google_genai_enabled:
         results["google_genai"] = _instrument_google_genai()
-    if agno:
+    if agno_enabled:
         results["agno"] = _instrument_agno()
-    if claude_agent_sdk:
+    if claude_agent_sdk_enabled:
         results["claude_agent_sdk"] = _instrument_claude_agent_sdk()
-    if dspy:
+    if dspy_enabled:
         results["dspy"] = _instrument_dspy()
-    if adk:
+    if adk_enabled:
         results["adk"] = _instrument_adk()
 
     return results
@@ -134,14 +148,34 @@ def _instrument_openai() -> bool:
     return False
 
 
-def _instrument_anthropic() -> bool:
+def _instrument_integration(integration, *, patch_config: IntegrationPatchConfig | None = None) -> bool:
     with _try_patch():
-        from braintrust.wrappers.anthropic import patch_anthropic
-
-        return patch_anthropic()
+        return integration.setup(
+            enabled_patchers=patch_config.enabled_patchers if patch_config is not None else None,
+            disabled_patchers=patch_config.disabled_patchers if patch_config is not None else None,
+        )
     return False
 
 
+def _normalize_bool_option(name: str, option: bool) -> bool:
+    if isinstance(option, bool):
+        return option
+
+    raise TypeError(f"auto_instrument option {name!r} must be a bool, got {type(option).__name__}")
+
+
+def _normalize_anthropic_option(option: InstrumentOption) -> tuple[bool, IntegrationPatchConfig | None]:
+    if isinstance(option, bool):
+        return option, None
+
+    if isinstance(option, IntegrationPatchConfig):
+        return True, option
+
+    raise TypeError(
+        f"auto_instrument option 'anthropic' must be a bool or IntegrationPatchConfig, got {type(option).__name__}"
+    )
+
+
 def _instrument_litellm() -> bool:
     with _try_patch():
         from braintrust.wrappers.litellm import patch_litellm
 
@@ -0,0 +1,5 @@
+from .anthropic import AnthropicIntegration
+from .config import IntegrationPatchConfig
+
+
+__all__ = ["AnthropicIntegration", "IntegrationPatchConfig"]