Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
21455d3
feat: add OTLP tracing foundation for evaluation runs
Dongbumlee Apr 3, 2026
a9f0afe
docs: add OTLP telemetry to AGENTS.md and copilot-instructions
Dongbumlee Apr 3, 2026
f932d98
feat: extend Foundry cloud evaluator coverage to 22 built-in evaluato…
Dongbumlee Apr 7, 2026
ab2736a
fix: skip telemetry tests when opentelemetry is not installed
Dongbumlee Apr 7, 2026
5b5aa6e
Merge pull request #56 from Azure/feature/otlp-tracing
Dongbumlee Apr 7, 2026
500966d
merge: resolve CHANGELOG conflict with develop (OTLP tracing)
Dongbumlee Apr 7, 2026
46ede70
docs: align all documentation with current implementation
Dongbumlee Apr 7, 2026
f887f65
feat: implement bundle list/show and run list/show commands
Dongbumlee Apr 7, 2026
2d4a52c
refactor: split CLI into command modules
Dongbumlee Apr 7, 2026
6017f3a
refactor: remove planned.py, move stubs to their command files
Dongbumlee Apr 7, 2026
ce9b628
Merge pull request #57 from Azure/feature/issue-51-extend-evaluators
placerda Apr 13, 2026
4e81967
Merge branch 'develop' into feature/browse-commands
placerda Apr 13, 2026
ba9a465
Merge pull request #59 from Azure/feature/browse-commands
placerda Apr 13, 2026
dd9172b
fix: remove duplicate _planned_command definition (ruff F811)
Dongbumlee Apr 13, 2026
6f18db6
feat(skills): add 3 new skills for full CLI coverage
Dongbumlee Apr 13, 2026
c6c7c79
feat(skills): add active workspace guard clauses to all downstream sk…
Dongbumlee Apr 13, 2026
e409dd0
feat(skills): add coverage for report show/export, model list, agent …
Dongbumlee Apr 13, 2026
42d5a9a
fix: remove duplicate _planned_command definition (ruff F811)
Dongbumlee Apr 13, 2026
a9653f2
style: apply ruff-format to comparison.py and test_cli_commands.py
Dongbumlee Apr 13, 2026
9d1f235
ci: integrate VSIX packaging with pre-release into CI/CD pipeline
Dongbumlee Apr 13, 2026
f2cd7ce
ci(vsix): add LICENSE to plugin package
Dongbumlee Apr 13, 2026
903be4b
ci(vsix): set publisher to AgentOpsToolkit and fix package name
Dongbumlee Apr 13, 2026
4f248d3
Merge pull request #68 from Azure/feature/agentops-skills
Dongbumlee Apr 13, 2026
daaf73e
Merge pull request #66 from Azure/fix/develop-lint-f811
Dongbumlee Apr 13, 2026
03b6b74
Merge pull request #67 from Azure/feature/skill-vsix-cicd
Dongbumlee Apr 13, 2026
e3b7640
ci(vsix): upload VSIX artifact from CI and staging pipelines (#69)
Dongbumlee Apr 13, 2026
60be078
ci(vsix): sync VSIX version from git tags in all pipelines (#70)
Dongbumlee Apr 13, 2026
b48765a
fix: resolve all mypy type errors across 6 source files (#71)
Dongbumlee Apr 14, 2026
9314553
docs: add CHANGELOG entries for mypy fixes and VSIX pipeline
Dongbumlee Apr 14, 2026
95e9b5f
chore: prepare release 0.1.4
github-actions[bot] Apr 14, 2026
9710013
fix: use global tag sort for VSIX version derivation
Dongbumlee Apr 14, 2026
f457dbf
Merge remote-tracking branch 'origin/main' into release/v0.1.4
Dongbumlee Apr 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ Only the following commands are in scope:

- `agentops init`
- `agentops eval run --config <run.yaml> [--output <dir>]`
- `agentops eval compare --runs <ID1>,<ID2>[,ID3,...] [--output <dir>]`
- `agentops report --in <results.json> [--out <report.md>]`
- `agentops config cicd [--force] [--dir <path>]`

Do not add new commands or flags unless explicitly discussed.

Expand All @@ -80,7 +82,7 @@ See `docs/how-it-works.md` for the full source-code map and architecture diagram
- Keep CLI command handlers **thin** (`cli/app.py`) — only parse args and call `services/`
- Place business logic in:
- `core/` — config loading, Pydantic models, thresholds, report generation. **Must have zero Azure SDK imports and zero network calls.**
- `services/` — orchestration (runner), Foundry publishing, workspace init, report regen
- `services/` — orchestration (runner), comparison, CI/CD workflow generation, Foundry publishing, workspace init, report regen
- `backends/` — execution backends (Foundry, subprocess). Each implements the `Backend` protocol from `base.py`.
- Use `pathlib.Path` everywhere (no raw string paths)
- No side effects at import time
Expand Down Expand Up @@ -130,6 +132,7 @@ The Foundry backend (`backends/foundry_backend.py`) is the largest and most comp
- Auto-derive Azure OpenAI endpoint from the project endpoint via `_derive_openai_endpoint_from_project()` — users should not need to set `AZURE_OPENAI_ENDPOINT` manually.
- Agent invocation supports both reference-based and threads-based API calls.
- Evaluator names map from class names to builtins: `SimilarityEvaluator` → `builtin.similarity`.
- Cloud evaluator routing uses frozensets: `_EVALUATORS_NEEDING_GROUND_TRUTH`, `_EVALUATORS_NEEDING_CONTEXT`, `_EVALUATORS_NEEDING_TOOL_CALLS`, `_EVALUATORS_NEEDING_TOOL_DEFS_ONLY`, `_EVALUATORS_NEEDING_OUTPUT_ITEMS`. NLP evaluators with required init params use `_NLP_DEFAULT_INIT_PARAMS`.

### Environment Variables

Expand Down Expand Up @@ -208,6 +211,10 @@ When cloud evaluation is used, a `cloud_evaluation.json` is also produced contai
- Foundry backend helpers (`test_foundry_backend.py`)
- Subprocess backend (`test_subprocess_backend.py`)
- Initializer (`test_initializer.py`)
- CI/CD workflow generation (`test_cicd.py`)
- CLI command behavior (`test_cli_commands.py`)
- Eval comparison logic (`test_comparison.py`)
- OTLP telemetry instrumentation (`test_telemetry.py`)
- Integration test for:
- `agentops eval run` end-to-end using a fake subprocess backend (`test_eval_run_integration.py`)
- Tests must assert correct **exit codes**
Expand Down Expand Up @@ -248,9 +255,18 @@ When generating or modifying code:
- Azure SDK imports must be **lazy** (inside functions, not top-level)
- Never hardcode Azure API versions — let the SDK handle versioning
- Keep user-facing log output clean — no warning cascades or retry noise
- When adding evaluator support, update both cloud (`_cloud_evaluator_data_mapping` + `_cloud_evaluator_needs_model`) and local paths
- When adding evaluator support, add the builtin name to the correct frozenset in `foundry_backend.py` (`_EVALUATORS_NEEDING_GROUND_TRUTH`, `_EVALUATORS_NEEDING_CONTEXT`, `_EVALUATORS_NEEDING_TOOL_CALLS`, `_EVALUATORS_NEEDING_TOOL_DEFS_ONLY`, or `_EVALUATORS_NEEDING_OUTPUT_ITEMS`), update `_NLP_DEFAULT_INIT_PARAMS` if init params are required, and update both cloud (`_cloud_evaluator_data_mapping` + `_cloud_evaluator_needs_model`) and local paths
- All new logic must have corresponding unit tests in `tests/unit/`
- Always mock Azure SDK calls in tests — tests must run without credentials
- The `core/` package must remain free of Azure imports and I/O
- Follow the request flow: CLI → Services → Backends → Core (never skip layers)
- If a change is user-visible, add an entry to `CHANGELOG.md` under `[Unreleased]` (Keep a Changelog format)

### OTLP Telemetry

- `utils/telemetry.py` provides optional OTLP trace emission for evaluation runs
- Activated by `AGENTOPS_OTLP_ENDPOINT` env var — zero overhead when unset
- All OpenTelemetry imports must be **lazy** (inside functions in `utils/telemetry.py`)
- `opentelemetry-sdk` is an optional runtime dependency — not declared in `pyproject.toml`
- Span schema: CICD semconv (`cicd.pipeline.*`) for pipeline structure, GenAI semconv (`gen_ai.*`) for agent calls, `agentops.eval.*` for evaluator scores
- When adding new spans, follow the three-layer pattern in `telemetry.py`
13 changes: 8 additions & 5 deletions .github/workflows/_build.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
# AgentOps Toolkit — Reusable Build Workflow
#
# Workflows:
# 1. ci.yml — Lint + test on every push/PR; publish dev builds to TestPyPI on develop
# 2. _build.yml — Reusable build (test + package), called by staging and release
# 3. staging.yml — Staging: release/* branch → TestPyPI → verify
# 4. release.yml — Production: v* tag → TestPyPI → verify → PyPI → GitHub Release
# 1. ci.yml — Lint + test on every push/PR; build VSIX validation
# 2. _build.yml — Reusable Python build (test + package), called by staging and release
# 3. staging.yml — Staging: release/* → TestPyPI + VSIX pre-release
# 4. release.yml — Production: v* tag → PyPI + VSIX stable + GitHub Release
# 5. cut-release.yml — Manual dispatch: create release branch + PR from develop
#
# Called by staging.yml and release.yml via workflow_call.
# Runs tests, builds the package (version via setuptools-scm), and uploads
# Runs tests, builds the Python package (version via setuptools-scm), and uploads
# the dist/ artifacts for downstream jobs.
#
# Note: VSIX packaging is handled directly in ci/staging/release workflows
# (requires Node.js + @vscode/vsce), not in this Python-focused reusable build.
#
# Usage in caller workflows:
# jobs:
# build:
Expand Down
105 changes: 103 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
# Workflows:
# 1. ci.yml — Lint + test on every push/PR; publish dev builds to TestPyPI on develop
# 2. _build.yml — Reusable build (test + package), called by staging and release
# 3. staging.yml — Staging: release/* branch → TestPyPI → verify
# 4. release.yml — Production: v* tag → TestPyPI → verify → PyPI → GitHub Release
# 3. staging.yml — Staging: release/* branch → TestPyPI → verify; VSIX pre-release → Marketplace
# 4. release.yml — Production: v* tag → TestPyPI → verify → PyPI → GitHub Release; VSIX stable → Marketplace
# 5. cut-release.yml — Manual dispatch: create release branch + PR from develop

name: CI
Expand Down Expand Up @@ -186,3 +186,104 @@ jobs:
echo "- TestPyPI: https://test.pypi.org/project/agentops-toolkit/${{ steps.version.outputs.version }}/" >> "$GITHUB_STEP_SUMMARY"
echo "" >> "$GITHUB_STEP_SUMMARY"
echo "Install: \`pip install agentops-toolkit==${{ steps.version.outputs.version }} --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/\`" >> "$GITHUB_STEP_SUMMARY"

# Validate that the VSIX extension packages correctly
build-vsix:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for version derivation

- name: Sync VSIX version from git tag
run: |
# Use global tag sort (not git describe) to find the latest tag
# across ALL branches, not just reachable ones from HEAD.
LAST_TAG=$(git tag -l 'v*' --sort=-v:refname | head -1)
LAST_TAG=${LAST_TAG:-v0.0.0}
LAST_VERSION=${LAST_TAG#v}
IFS='.' read -r MAJOR MINOR PATCH <<< "$LAST_VERSION"
if git describe --tags --exact-match HEAD >/dev/null 2>&1; then
BASE_VERSION="$LAST_VERSION"
else
BASE_VERSION="$MAJOR.$MINOR.$((PATCH + 1))"
fi
jq --arg v "$BASE_VERSION" '.version = $v' \
plugins/agentops/package.json > plugins/agentops/package.json.tmp
mv plugins/agentops/package.json.tmp plugins/agentops/package.json
echo "VSIX version set to $BASE_VERSION (from tag $LAST_TAG)"

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"

- name: Install vsce
run: npm install -g @vscode/vsce

- name: Package VSIX (dry run)
working-directory: plugins/agentops
run: vsce package -o agentops-skills.vsix

- name: Show VSIX info
run: |
ls -la plugins/agentops/*.vsix
echo "✅ VSIX packaging validated"

- name: Upload VSIX artifact
uses: actions/upload-artifact@v4
with:
name: vsix
path: plugins/agentops/*.vsix

# Publish VSIX pre-release to Marketplace on every push to develop (not PRs)
publish-vsix-dev:
if: github.event_name == 'push' && github.ref == 'refs/heads/develop'
needs: [lint, test, build-vsix]
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for version derivation

- name: Sync VSIX version from git tag
run: |
# Use global tag sort (not git describe) to find the latest tag
# across ALL branches, not just reachable ones from HEAD.
LAST_TAG=$(git tag -l 'v*' --sort=-v:refname | head -1)
LAST_TAG=${LAST_TAG:-v0.0.0}
LAST_VERSION=${LAST_TAG#v}
IFS='.' read -r MAJOR MINOR PATCH <<< "$LAST_VERSION"
if git describe --tags --exact-match HEAD >/dev/null 2>&1; then
BASE_VERSION="$LAST_VERSION"
else
BASE_VERSION="$MAJOR.$MINOR.$((PATCH + 1))"
fi
jq --arg v "$BASE_VERSION" '.version = $v' \
plugins/agentops/package.json > plugins/agentops/package.json.tmp
mv plugins/agentops/package.json.tmp plugins/agentops/package.json
echo "VSIX version set to $BASE_VERSION (from tag $LAST_TAG)"

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"

- name: Install vsce
run: npm install -g @vscode/vsce

- name: Package VSIX (pre-release)
working-directory: plugins/agentops
run: vsce package --pre-release -o agentops-skills.vsix

- name: Publish pre-release to VS Code Marketplace
continue-on-error: true # Tolerate "already exists" for dev builds
working-directory: plugins/agentops
run: vsce publish --pre-release --packagePath agentops-skills.vsix -p "${{ secrets.VSCE_PAT }}"

- name: Summary
run: |
echo "## ✅ VSIX pre-release published to Marketplace" >> "$GITHUB_STEP_SUMMARY"
echo "" >> "$GITHUB_STEP_SUMMARY"
echo "Extension: [AgentOps Toolkit](https://marketplace.visualstudio.com/items?itemName=AgentOpsToolkit.agentops-toolkit)" >> "$GITHUB_STEP_SUMMARY"
27 changes: 19 additions & 8 deletions .github/workflows/cut-release.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# AgentOps Toolkit — Cut Release
#
# Workflows:
# 1. ci.yml — Lint + test on every push/PR
# 1. ci.yml — Lint + test on every push/PR; VSIX build validation
# 2. _build.yml — Reusable build (test + package), called by staging and release
# 3. staging.yml — Staging: release/* branch → TestPyPI → verify
# 4. release.yml — Production: v* tag → TestPyPI → verify → PyPI → GitHub Release
# 3. staging.yml — Staging: release/* → TestPyPI → verify; VSIX pre-release → Marketplace
# 4. release.yml — Production: v* tag → TestPyPI → verify → PyPI → GH Release; VSIX stable → Marketplace
# 5. cut-release.yml — Manual dispatch: create release branch + PR from develop
#
# One-click release branch creation. Triggered manually from the Actions tab.
# Creates a release branch from develop, updates CHANGELOG.md, and opens a PR to main.
# Creates a release branch from develop, updates CHANGELOG.md, syncs the
# VS Code extension version in package.json, and opens a PR to main.
# The branch push then triggers staging.yml automatically.
#
# Usage:
Expand Down Expand Up @@ -72,14 +73,21 @@ jobs:
# Replace [Unreleased] with versioned section, add fresh Unreleased above
sed -i "s/## \[Unreleased\]/## [Unreleased]\n\n## [${{ env.version }}] - $DATE/" CHANGELOG.md

- name: Sync VS Code extension version
run: |
jq --arg v "${{ env.version }}" '.version = $v' \
plugins/agentops/package.json > plugins/agentops/package.json.tmp
mv plugins/agentops/package.json.tmp plugins/agentops/package.json
echo "VSIX version set to ${{ env.version }}"

- name: Configure git
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"

- name: Commit and push
run: |
git add CHANGELOG.md
git add CHANGELOG.md plugins/agentops/package.json
git commit -m "chore: prepare release ${{ env.version }}"
git push origin "release/v${{ env.version }}"

Expand All @@ -98,22 +106,24 @@ jobs:
### What happened
- Branch \`release/v${{ env.version }}\` created from \`develop\`
- \`CHANGELOG.md\` updated: \`[Unreleased]\` → \`[${{ env.version }}]\`
- Staging pipeline triggered automatically (build → TestPyPI → verify)
- \`plugins/agentops/package.json\` version synced to \`${{ env.version }}\`
- Staging pipeline triggered automatically (build → TestPyPI + VSIX pre-release → verify)

### Next steps
1. Wait for the **Staging** pipeline to pass
2. Review and approve this PR
3. Merge to \`main\`
4. Tag and push: \`git tag v${{ env.version }} && git push origin v${{ env.version }}\`
5. Approve the PyPI publish in the **Release** workflow
5. Approve the PyPI publish and VSIX stable publish in the **Release** workflow
6. Sync develop: \`git checkout develop && git merge main && git push origin develop\`

### Checklist
- [ ] Staging pipeline passes (build + TestPyPI + verify)
- [ ] Staging pipeline passes (build + TestPyPI + VSIX pre-release + verify)
- [ ] CHANGELOG entries reviewed
- [ ] PR approved and merged to main
- [ ] Tag \`v${{ env.version }}\` pushed
- [ ] PyPI publish approved
- [ ] VSIX stable publish approved
- [ ] develop synced from main"

- name: Summary
Expand All @@ -122,6 +132,7 @@ jobs:
echo "" >> "$GITHUB_STEP_SUMMARY"
echo "- Branch: \`release/v${{ env.version }}\`" >> "$GITHUB_STEP_SUMMARY"
echo "- CHANGELOG updated with version **${{ env.version }}**" >> "$GITHUB_STEP_SUMMARY"
echo "- VS Code extension version synced to **${{ env.version }}**" >> "$GITHUB_STEP_SUMMARY"
echo "- PR opened: \`release/v${{ env.version }}\` → \`main\`" >> "$GITHUB_STEP_SUMMARY"
echo "- Staging pipeline triggered automatically" >> "$GITHUB_STEP_SUMMARY"
echo "" >> "$GITHUB_STEP_SUMMARY"
Expand Down
Loading
Loading