Release v0.1.4 by Dongbumlee · Pull Request #72 · Azure/agentops

Dongbumlee · 2026-04-14T00:32:26Z

Release v0.1.4

Automated release branch created from develop.

What happened

Branch release/v0.1.4 created from develop
CHANGELOG.md updated: [Unreleased] to [0.1.4]
plugins/agentops/package.json version synced to 0.1.4
Staging pipeline triggered automatically (build -> TestPyPI + VSIX pre-release -> verify)

Next steps

Wait for the Staging pipeline to pass
Review and approve this PR
Merge to main
Tag and push: git tag v0.1.4 && git push origin v0.1.4
Approve the PyPI publish and VSIX stable publish in the Release workflow
Sync develop: git checkout develop && git merge main && git push origin develop

Checklist

Staging pipeline passes (build + TestPyPI + VSIX pre-release + verify)
CHANGELOG entries reviewed
PR approved and merged to main
Tag v0.1.4 pushed
PyPI publish approved
VSIX stable publish approved
develop synced from main

- Add utils/telemetry.py with lazy OTel imports and span context managers - Instrument runner.py with three-layer schema (CICD + GenAI + agentops.eval) - Root span per eval run, item spans per row, evaluator child spans - Activated via AGENTOPS_OTLP_ENDPOINT env var (opt-in, zero overhead) - Graceful no-op when opentelemetry-sdk is not installed - 16 unit tests covering disabled, degraded, and enabled states Refs: #14

…rs (#51) - Expand evaluator frozensets: add response_completeness, groundedness_pro, retrieval, tool_selection to existing sets - Add new frozensets: _EVALUATORS_NEEDING_TOOL_DEFS_ONLY (tool_input_accuracy, tool_output_utilization, tool_call_success), _EVALUATORS_NEEDING_OUTPUT_ITEMS (task_adherence) - Fix NLP evaluator names (bleu_score, rouge_score, etc.) to match _to_builtin_evaluator_name conversion - Add default initialization_parameters for RougeScoreEvaluator (rouge_type) - Build item_schema dynamically: include tool_definitions and context_field when evaluators need them - Refactor _default_foundry_input_mapping to frozenset-based routing - Improve error handling: log evaluator errors when score is null, improve runner error message with --verbose hint - Add CI/CD integration models documentation: PR gate, scheduled, post-deploy, multi-env promotion, Azure DevOps pipeline - Add gating best practices: threshold design, evaluator selection by scenario - Add supported evaluators reference table (22 evaluators by category) - Add ~20 unit tests for all new evaluator data_mapping patterns - All 22 evaluators verified end-to-end with live Foundry cloud evaluation Closes #51

TestSpanAttributesWhenEnabled requires opentelemetry to be installed because the code paths import SpanKind/StatusCode when tracing is enabled. Use pytest.importorskip to skip the class in CI where opentelemetry is not a declared dependency.

Add OTLP tracing support and documentation for evaluation runs

- Fix skill paths: plugins/agentops/skills/ (not .github/plugins/) across README, tutorial-copilot-skills (6 instances) - Fix CLI contract: add eval compare and config cicd as implemented commands in AGENTS.md, copilot-instructions.md, how-it-works.md - Fix source tree listings: add cicd.py, comparison.py, telemetry.py, workflows/ across AGENTS.md, how-it-works.md - Fix test listings: add test_cicd, test_cli_commands, test_comparison, test_telemetry across AGENTS.md, copilot-instructions.md, how-it-works.md - Fix agent_tools_baseline: TaskCompletionEvaluator + ToolCallAccuracyEvaluator (not SimilarityEvaluator placeholder) in README, AGENTS.md, how-it-works.md - Fix JSONL path: data/<name>.jsonl (not datasets/) in ci-github-actions.md - Fix init flag: --dir (not --path) in README - Fix evaluator guidance: add frozenset names and NLP_DEFAULT_INIT_PARAMS to copilot-instructions.md - Add context_field to dataset format docs in AGENTS.md - Add rouge_type default note to evaluator reference doc - Update planned command message to list all 5 available commands - Add --format flag to CLI usage examples

- Add services/browse.py with list_bundles, show_bundle, list_runs, show_run - Replace planned stubs with working implementations in cli/app.py - bundle list: shows all bundles with evaluators and threshold count - bundle show: displays full bundle detail (evaluators, thresholds, metadata) - run list: shows all past runs with status, bundle, dataset, duration - run show: displays full run detail (metrics, thresholds, items, Foundry URL) - Add 16 unit tests (service + CLI) in test_browse.py - All commands are read-only, no side effects, no Azure API calls

Split app.py (487 lines) into focused command modules: - app.py (114 lines) — root app, global callback, init, sub-app registration - eval_commands.py (108 lines) — eval run, eval compare - report_commands.py (66 lines) — report, report show/export stubs - browse_commands.py (152 lines) — bundle list/show, run list/show/view - config_commands.py (56 lines) — config cicd, config validate/show stubs - planned.py (57 lines) — dataset, monitor, trace, model, agent stubs - _planned.py (12 lines) — shared planned command helper No behavior changes. All 96 tests pass.

- Move dataset stubs to dataset_commands.py (ready for Tier 2 implementation) - Inline monitor/trace/model/agent stubs in app.py (1-2 commands each) - Delete planned.py — no more catch-all stub file

feat: extend Foundry cloud evaluator coverage to 22 built-in evaluators (#51)

feat: implement bundle list/show and run list/show commands

Add agentops-workspace-setup, agentops-browse-inspect, and agentops-dataset-management skills covering all remaining CLI commands not handled by existing evaluation-focused skills. - agentops-workspace-setup: init, config cicd, config validate/show - agentops-browse-inspect: bundle list/show, run list/show/view - agentops-dataset-management: dataset creation, YAML/JSONL format, field mapping, planned validate/describe/import commands

…ills Add '## Before You Start' section to 5 downstream skills enforcing workspace verification before proceeding: - agentops-run-evals - agentops-investigate-regression - agentops-observability-triage - agentops-browse-inspect - agentops-dataset-management Each skill now instructs the agent to check for .agentops/ directory and redirect to agentops-workspace-setup skill if missing. This provides soft enforcement at the skill layer, complementing the hard CLI enforcement (FileNotFoundError) already in place.

…list planned commands

- ci.yml: add build-vsix validation job (package only, no publish) - staging.yml: add publish-vsix-prerelease job (vsce publish --pre-release) - release.yml: add publish-vsix stable job + attach VSIX to GitHub Release - cut-release.yml: sync package.json version via jq, update PR body/checklist - _build.yml: update header comments (Python-only, no VSIX logic) - plugins/agentops: add README.md, CHANGELOG.md, .vscodeignore, package.json scripts Requires VSCE_PAT secret in staging and release GitHub environments.

Feature/agentops skills

fix: remove duplicate _planned_command definition (ruff F811)

ci: integrate VSIX packaging with pre-release into CI/CD pipeline

* ci(vsix): upload VSIX artifact from CI and staging pipelines * ci: publish VSIX pre-release to Marketplace on develop pushes Add publish-vsix-dev job to ci.yml that publishes the VSIX as a pre-release to the VS Code Marketplace on every push to develop, mirroring the publish-dev job that pushes to TestPyPI. - Gated on push to develop only (not PRs) - Depends on lint, test, and build-vsix jobs - Uses staging environment (VSCE_PAT secret) - Packages with --pre-release flag - Includes step summary with Marketplace link

* ci(vsix): sync VSIX version from git tags in all pipelines Derive package.json version at CI time from the latest git tag using git describe + jq. Mimics setuptools-scm patch-increment behavior: - On exact tag (release): use tag version directly (e.g. v0.2.0 -> 0.2.0) - Off tag (develop/PR): increment patch (e.g. v0.1.0 + commits -> 0.1.1) Applied to all 4 VSIX jobs: - ci.yml: build-vsix, publish-vsix-dev - staging.yml: publish-vsix-prerelease - release.yml: publish-vsix Also adds fetch-depth: 0 to checkout steps so git describe has access to the full tag history. * fix(vsix): update Marketplace link placeholder in README * docs(vsix): improve README — remove misleading Prerequisites, expand Usage examples * docs(vsix): remove CLI install note — skills handle setup automatically

* ci(vsix): sync VSIX version from git tags in all pipelines Derive package.json version at CI time from the latest git tag using git describe + jq. Mimics setuptools-scm patch-increment behavior: - On exact tag (release): use tag version directly (e.g. v0.2.0 -> 0.2.0) - Off tag (develop/PR): increment patch (e.g. v0.1.0 + commits -> 0.1.1) Applied to all 4 VSIX jobs: - ci.yml: build-vsix, publish-vsix-dev - staging.yml: publish-vsix-prerelease - release.yml: publish-vsix Also adds fetch-depth: 0 to checkout steps so git describe has access to the full tag history. * fix(vsix): update Marketplace link placeholder in README * docs(vsix): improve README — remove misleading Prerequisites, expand Usage examples * docs(vsix): remove CLI install note — skills handle setup automatically * fix: resolve all mypy type errors across 6 source files - foundry_backend.py: assert narrowing for Optional[str], Dict type widening - config_loader.py: added BaseModel import and TypeVar bound - reporter.py: removed conflicting annotations, renamed shadowed loop vars - browse.py: split Path | None annotation into separate assignment - comparison.py: fixed _compute_metric_direction return type, renamed loop vars - runner.py: added imports, Pydantic model constructors

Replace git describe --tags --abbrev=0 with git tag -l --sort=-v:refname to find the latest tag across ALL branches, not just reachable ones. Root cause: v0.1.3 tag on main was not reachable from develop, so git describe found v0.1.2 and derived version 0.1.3, which already existed on the Marketplace. Also adds continue-on-error on dev/staging VSIX publish steps as a safety net against 'already exists' errors.

# Conflicts: # .github/workflows/release.yml # .github/workflows/staging.yml # CHANGELOG.md

Dongbumlee and others added 30 commits April 3, 2026 09:42

docs: add OTLP telemetry to AGENTS.md and copilot-instructions

a9f0afe

Merge pull request #56 from Azure/feature/otlp-tracing

5b5aa6e

Add OTLP tracing support and documentation for evaluation runs

merge: resolve CHANGELOG conflict with develop (OTLP tracing)

500966d

refactor: remove planned.py, move stubs to their command files

6017f3a

- Move dataset stubs to dataset_commands.py (ready for Tier 2 implementation) - Inline monitor/trace/model/agent stubs in app.py (1-2 commands each) - Delete planned.py — no more catch-all stub file

Merge pull request #57 from Azure/feature/issue-51-extend-evaluators

ce9b628

feat: extend Foundry cloud evaluator coverage to 22 built-in evaluators (#51)

Merge branch 'develop' into feature/browse-commands

4e81967

Merge pull request #59 from Azure/feature/browse-commands

ba9a465

feat: implement bundle list/show and run list/show commands

fix: remove duplicate _planned_command definition (ruff F811)

dd9172b

feat(skills): add coverage for report show/export, model list, agent …

e409dd0

…list planned commands

fix: remove duplicate _planned_command definition (ruff F811)

42d5a9a

style: apply ruff-format to comparison.py and test_cli_commands.py

a9653f2

ci(vsix): add LICENSE to plugin package

f2cd7ce

ci(vsix): set publisher to AgentOpsToolkit and fix package name

903be4b

Merge pull request #68 from Azure/feature/agentops-skills

4f248d3

Feature/agentops skills

Merge pull request #66 from Azure/fix/develop-lint-f811

daaf73e

fix: remove duplicate _planned_command definition (ruff F811)

Merge pull request #67 from Azure/feature/skill-vsix-cicd

03b6b74

ci: integrate VSIX packaging with pre-release into CI/CD pipeline

docs: add CHANGELOG entries for mypy fixes and VSIX pipeline

9314553

chore: prepare release 0.1.4

95e9b5f

Dongbumlee had a problem deploying to staging April 14, 2026 00:35 — with GitHub Actions Failure

Dongbumlee temporarily deployed to staging April 14, 2026 00:35 — with GitHub Actions Inactive

Dongbumlee temporarily deployed to staging April 14, 2026 00:49 — with GitHub Actions Inactive

Merge remote-tracking branch 'origin/main' into release/v0.1.4

f457dbf

# Conflicts: # .github/workflows/release.yml # .github/workflows/staging.yml # CHANGELOG.md

Dongbumlee merged commit 48cbab0 into main Apr 14, 2026
1 check passed

Dongbumlee deleted the release/v0.1.4 branch April 14, 2026 00:59

Dongbumlee temporarily deployed to staging April 14, 2026 00:59 — with GitHub Actions Inactive

Dongbumlee had a problem deploying to staging April 14, 2026 00:59 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.1.4#72

Release v0.1.4#72
Dongbumlee merged 32 commits into
mainfrom
release/v0.1.4

Dongbumlee commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dongbumlee commented Apr 14, 2026

Release v0.1.4

What happened

Next steps

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants