Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions .github/workflows/nightly-sdk-gap-audit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
name: Nightly SDK Gap Audit

on:
schedule:
- cron: "0 6 * * *"
workflow_dispatch:

permissions:
contents: read

concurrency:
group: provider-gap-audit-${{ github.ref }}
cancel-in-progress: false

jobs:
audit-sdk-coverage:
name: Audit SDK Coverage
runs-on: ubuntu-latest
timeout-minutes: 45
steps:
- name: Generate GitHub App token
id: app-token
uses: actions/create-github-app-token@f8d387b68d61c58ab83c6c016672934102569859 # v3.0.0
with:
app-id: ${{ secrets.BRAINTRUST_BOT_APP_ID }}
private-key: ${{ secrets.BRAINTRUST_BOT_PRIVATE_KEY }}
owner: braintrustdata
repositories: |
braintrust-sdk-python
permission-contents: read
permission-issues: write

- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
with:
persist-credentials: false

- name: Run Claude provider gap audit
uses: anthropics/claude-code-action@df37d2f0760a4b5683a6e617c9325bc1a36443f6 # v1.0.75
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
# Pass our own app token so the Claude action does not try to mint its
# own GitHub token via OIDC. That OIDC path currently fails for this
# workflow shape upstream, and this app token also lets us tightly scope
# Claude's remote permissions to repo contents read + issues write only.
github_token: ${{ steps.app-token.outputs.token }}
prompt: |
# Goal

Find important instrumentation gaps in this repository's AI-facing integrations.

A gap means the upstream SDK or framework clearly supports something that this repository does not yet instrument, or instruments with materially less detail.

# Scope

This repository is the Braintrust Python SDK. Focus exclusively on the Python integrations it ships.
Discover them by inspecting the following directories in the checked-out repository:

- `py/src/braintrust/wrappers/` — AI provider wrappers and framework integrations
- `py/src/braintrust/contrib/` — additional contributed integrations
- `py/examples/` — usage examples that reveal what surfaces are considered supported

Do not look at the `integrations/` directory (deprecated old integrations).

- Infer additional relevant surfaces from the checked-out repository itself (tests, noxfile.py, pyproject.toml).
- Ignore generic runtime or infrastructure integrations that do not have an upstream AI API surface to compare against.
- Do not open parity issues for deprecated or no-op surfaces unless you find real deprecation drift or docs drift worth reporting.

# Process

1. Inspect local code, tests, docs, examples, and e2e scenarios to understand what is already instrumented.
2. For each relevant surface, independently discover the current official upstream docs and recent official releases or changelogs.
3. Compare current upstream capabilities to current Braintrust instrumentation in this repo.
4. Always check the latest Braintrust docs at https://www.braintrust.dev/docs before deciding how to describe a gap.
5. Search existing GitHub issues for duplicates before creating anything.
6. Only act on high-confidence, concrete gaps tied to missing APIs, unsupported call patterns, or missing instrumentation detail.

# Examples

## Good

- The upstream SDK now has a stable `responses.stream()` helper, but this repo does not instrument it at all, or instruments it without final result metadata that other similar APIs already capture here.
- This repo instruments a provider's basic text generation API, but not its newer tool-calling or agent tracing API even though that API is now official and documented.

## Bad

- A vague suspicion that "something in streaming may be missing" without a concrete upstream API and a concrete repo gap.
- Opening a separate issue for every release note bullet when they all describe the same missing instrumentation area.

# If You Find Actionable Non-Duplicate Gaps

- Create at most 5 issues in this run.
- Create one issue per distinct gap.
- Keep each issue concise, concrete, and source-backed.
- Include a hidden marker comment near the top of the issue body in this exact form:

```html
<!-- provider-gap-audit: <gap_id> -->
```

Each issue should clearly include:

- what instrumentation is missing
- whether Braintrust docs suggest the capability is `supported`, `unclear`, or `not_found`
- exact upstream sources
- exact Braintrust docs source or sources
- exact local repo files you inspected

# Duplicate Handling

- Do not create an issue if an open issue already covers the same gap.
- Treat a matching hidden marker comment or a clearly equivalent open issue as a duplicate.
- If duplicate checking is inconclusive, do not create the issue.

# Constraints

- Discover source URLs yourself. Do not rely on a preset list.
- Prefer official docs and official release sources.
- Do not create comments.
- Do not update, close, or label existing issues.
- Do not create pull requests.
- If there are no high-confidence non-duplicate gaps, do nothing.
# The Claude action includes base GitHub tools by default, and --allowedTools adds to that set
# rather than replacing it. Keep the deny-list so Claude cannot use other remote write tools.
claude_args: |
--model claude-opus-4-6
--max-turns 20
--allowedTools "Read,Glob,Grep,LS,WebSearch,WebFetch,mcp__github__get_issue,mcp__github__get_issue_comments,mcp__github__search_issues,mcp__github__list_issues,mcp__github__create_issue"
--disallowedTools "Bash,Edit,MultiEdit,Write,Replace,NotebookEditCell,mcp__github__create_issue_comment,mcp__github__update_issue,mcp__github__create_pr,mcp__github__create_or_update_file,mcp__github__delete_file,mcp__github_file_ops__commit_files,mcp__github_file_ops__delete_files"
Loading