Skip to content

feat: remove evaluation infrastructure (moved to openadapt-evals)#25

Merged
abrichr merged 2 commits intomainfrom
feat/remove-eval-infra
Feb 13, 2026
Merged

feat: remove evaluation infrastructure (moved to openadapt-evals)#25
abrichr merged 2 commits intomainfrom
feat/remove-eval-infra

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Feb 13, 2026

Summary

Deleted files

  • benchmarks/cli.py (8,503 lines) - VM/pool management CLI
  • benchmarks/azure_vm.py - AzureVMManager
  • benchmarks/pool.py - PoolManager
  • benchmarks/vm_monitor.py, azure_ops_tracker.py, resource_tracker.py
  • benchmarks/azure.py, viewer.py, pool_viewer.py, trace_export.py
  • benchmarks/waa_deploy/ - Docker agent deployment
  • 4 test files moved to openadapt-evals

Kept in openadapt-ml

  • benchmarks/agent.py - PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent (ML model wrappers)

Migration Guide

Old import (openadapt-ml) New import (openadapt-evals)
openadapt_ml.benchmarks.cli openadapt_evals.benchmarks.vm_cli (or oa-vm CLI)
openadapt_ml.benchmarks.azure_vm.AzureVMManager openadapt_evals.infrastructure.azure_vm.AzureVMManager
openadapt_ml.benchmarks.pool.PoolManager openadapt_evals.infrastructure.pool.PoolManager
openadapt_ml.benchmarks.vm_monitor.VMMonitor openadapt_evals.infrastructure.vm_monitor.VMMonitor
openadapt_ml.benchmarks.azure_ops_tracker openadapt_evals.infrastructure.azure_ops_tracker
openadapt_ml.benchmarks.resource_tracker openadapt_evals.infrastructure.resource_tracker
openadapt_ml.benchmarks.pool_viewer openadapt_evals.benchmarks.pool_viewer
openadapt_ml.benchmarks.trace_export openadapt_evals.benchmarks.trace_export
openadapt_ml.benchmarks.waa_deploy openadapt_evals.waa_deploy

CLI migration

Old command New command
python -m openadapt_ml.benchmarks.cli pool-create oa-vm pool-create
python -m openadapt_ml.benchmarks.cli pool-run oa-vm pool-run
python -m openadapt_ml.benchmarks.cli pool-status oa-vm pool-status
python -m openadapt_ml.benchmarks.cli pool-cleanup oa-vm pool-cleanup
python -m openadapt_ml.benchmarks.cli create oa-vm create
python -m openadapt_ml.benchmarks.cli status oa-vm status
python -m openadapt_ml.benchmarks.cli vm monitor oa-vm vm monitor
All other CLI commands oa-vm <command>

What stays the same

# ML agents - unchanged
from openadapt_ml.benchmarks import PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent

Test plan

  • uv run pytest tests/ -v — 253 passed, 6 skipped
  • from openadapt_ml.benchmarks import PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent works
  • ruff check passes
  • Verify no remaining references to deleted modules in non-deleted code

🤖 Generated with Claude Code

abrichr and others added 2 commits February 13, 2026 14:42
All evaluation infrastructure (~13,000 lines) has been migrated to
openadapt-evals (PR #29). This PR removes the now-redundant code from
openadapt-ml, making it a pure ML package.

Deleted files:
- benchmarks/cli.py (8,503 lines - VM/pool CLI)
- benchmarks/azure_vm.py (AzureVMManager)
- benchmarks/pool.py (PoolManager)
- benchmarks/vm_monitor.py, azure_ops_tracker.py, resource_tracker.py
- benchmarks/azure.py, viewer.py, pool_viewer.py, trace_export.py
- benchmarks/waa_deploy/ (Docker agent deployment)
- tests/test_quota_auto_detection.py, test_demo_persistence.py
- tests/benchmarks/test_api_agent.py, test_waa.py

Updated:
- benchmarks/__init__.py: Only exports ML agents (PolicyAgent, etc.)
- pyproject.toml: Removed azure-ai-ml, azureml-core, azure-mgmt-*
- CLAUDE.md: Removed CLI/VM/pool docs, added migration guide

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update all remaining references to deleted benchmark modules
across source code, scripts, and tests:

- cloud/local.py: azure_ops_tracker, session_tracker, CLI subprocess calls
- scripts/: p0/p1 validation scripts, screenshot generators, quota checker
- training/benchmark_viewer.py: HTML template CLI references
- experiments/waa_demo/runner.py: docstring and print references
- deprecated/waa_deploy/__init__.py: import path

All now point to openadapt_evals equivalents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr merged commit 2d57d02 into main Feb 13, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant