feat(ai_dynamo): add aiperf workload support#898
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds AIPerf as a new AIDynamo workload: Python model and export, Bash wrapper to run aiperf and collect CSV results, shell CLI dispatch and Slurm emission, test config updates, and report selection/tests updated. ChangesAIPerf Workload Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/workloads/ai_dynamo/aiperf.sh`:
- Line 136: The unquoted ${extra_args} causes word-splitting and glob expansion;
change extra_args from a plain string to an array (e.g., declare -a extra_args
or use read -a / set -- to populate it) and expand it as a quoted array when
invoking the command: replace the unquoted ${extra_args} usage with
"${extra_args[@]}", ensuring extra_args is initialized (empty array if none) and
any code that appends flags uses array assignments (extra_args+=(--flag
"value")) so values containing spaces or asterisks are preserved.
- Around line 79-80: The argument parser currently treats any --* option as
taking a value and unconditionally does `shift 2`, which breaks when a boolean
flag appears last; update the `--*)` case in aiperf.sh to check whether the next
positional argument exists and does NOT begin with '-' before consuming two
args: if $2 is non-empty and does not start with '-' then shift 2, otherwise
shift 1 so boolean/standalone flags are handled safely under set -e.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 0e29aa77-2919-4737-a92c-3779a9270c4f
📒 Files selected for processing (9)
conf/experimental/ai_dynamo/test/aiperf.tomlconf/experimental/ai_dynamo/test_scenario/aiperf_slurm_single_node.tomlconf/experimental/ai_dynamo/test_scenario/vllm_slurm_single_node.tomlsrc/cloudai/workloads/ai_dynamo/__init__.pysrc/cloudai/workloads/ai_dynamo/ai_dynamo.pysrc/cloudai/workloads/ai_dynamo/ai_dynamo.shsrc/cloudai/workloads/ai_dynamo/aiperf.shsrc/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.pytests/ref_data/ai-dynamo.sbatch
|
@saivishal1999 How did you test it? The |
|
@saivishal1999 what about metrics (repot generation strategy)? Are the output artifacts of aiperf and genai perf exactly the same or AIDynamoReportGenerationStrategy can handle both? |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/workloads/ai_dynamo/report_generation_strategy.py (1)
47-62:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd backward-compatible fallback when defaulting to AIPerf.
Switching the unprefixed default to
aiperfcan break existing runs that only producegenai_perf_report.csv; those metrics now returnMETRIC_ERROR. Add a fallback togenai_perf_report.csvwhen unprefixed lookup can’t find aiperf output.Suggested patch
def get_metric(self, metric: str) -> MetricValue: logging.info(f"Getting metric: {metric}") benchmark_name = "aiperf" + metric_has_prefix = ":" in metric metric_name = metric metric_type = "avg" - if ":" in metric: + if metric_has_prefix: parts = metric.split(":", maxsplit=2) if len(parts) != 3: logging.warning(f"Invalid metric format: {metric}. Expected 'benchmark:metric_name:metric_type'") return METRIC_ERROR benchmark_name, metric_name, metric_type = parts source_csv = self.test_run.output_path / f"{benchmark_name}_report.csv" logging.info(f"CSV file: {source_csv}") if not source_csv.exists() or source_csv.stat().st_size == 0: + if not metric_has_prefix and benchmark_name == "aiperf": + fallback_csv = self.test_run.output_path / "genai_perf_report.csv" + if fallback_csv.exists() and fallback_csv.stat().st_size > 0: + return self.extract_metric_from_csv(fallback_csv, metric_name, metric_type) logging.info(f"CSV file: {source_csv} does not exist or is empty") return METRIC_ERROR
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/workloads/ai_dynamo/test_report_gen_strategy.py`:
- Around line 123-152: Add a regression test to verify unprefixed metric lookup
falls back to genai_perf_report.csv when aiperf_report.csv is missing: create a
new test (parallel to test_ai_dynamo_get_metric_aiperf_defaults) that
instantiates AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr),
removes or empties "aiperf_report.csv" and ensures a "genai_perf_report.csv"
with the expected CSV content is present in ai_dynamo_tr.output_path, then
assert strategy.get_metric("Inter Token Latency (ms)") and
strategy.get_metric("Output Token Throughput (tokens/sec)") return the same
numeric values as the defaults (and that nonexistent metrics still return
METRIC_ERROR); use the existing ai_dynamo_tr fixture and the
AIDynamoReportGenerationStrategy.get_metric method to locate behavior to test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 1e03d231-ddba-48c2-8bc6-0baedcc45690
📒 Files selected for processing (4)
conf/experimental/ai_dynamo/test/sglang.tomlconf/experimental/ai_dynamo/test_scenario/sglang_slurm.tomlsrc/cloudai/workloads/ai_dynamo/report_generation_strategy.pytests/workloads/ai_dynamo/test_report_gen_strategy.py
| def test_ai_dynamo_get_metric_aiperf_defaults(slurm_system: SlurmSystem, ai_dynamo_tr: TestRun) -> None: | ||
| strategy = AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr) | ||
|
|
||
| # Use exact metric names from CSV (with avg column, which is default) | ||
| assert strategy.get_metric("Time To First Token (ms)") == 111.12 | ||
| assert strategy.get_metric("Time To Second Token (ms)") == 11.13 | ||
| assert strategy.get_metric("Request Latency (ms)") == 1111.14 | ||
| assert strategy.get_metric("Inter Token Latency (ms)") == 12.34 | ||
| # bare metric names default to aiperf_report.csv (avg column) | ||
| assert strategy.get_metric("Inter Token Latency (ms)") == 2.83 | ||
| assert strategy.get_metric("Output Token Throughput (tokens/sec)") == 595.68 | ||
|
|
||
|
|
||
| def test_ai_dynamo_get_metric_aiperf_explicit(slurm_system: SlurmSystem, ai_dynamo_tr: TestRun) -> None: | ||
| strategy = AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr) | ||
|
|
||
| # per-request metrics (first CSV section with avg/p50 columns) | ||
| assert strategy.get_metric("aiperf:Inter Token Latency (ms):avg") == 2.83 | ||
| assert strategy.get_metric("aiperf:Inter Token Latency (ms):p50") == 2.83 | ||
| assert strategy.get_metric("aiperf:Time to First Token (ms):avg") == 49.87 | ||
|
|
||
| # summary metrics (second CSV section — value lands in "avg" column by position) | ||
| assert strategy.get_metric("aiperf:Output Token Throughput (tokens/sec):avg") == 595.68 | ||
| assert strategy.get_metric("aiperf:Total Token Throughput (tokens/sec):avg") == 954.47 | ||
|
|
||
|
|
||
| def test_ai_dynamo_get_metric_invalid(slurm_system: SlurmSystem, ai_dynamo_tr: TestRun) -> None: | ||
| strategy = AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr) | ||
|
|
||
| assert strategy.get_metric("invalid-metric") == METRIC_ERROR | ||
| assert strategy.get_metric("nonexistent-metric") == METRIC_ERROR | ||
|
|
||
| # Empty the CSV file to test error handling | ||
| (ai_dynamo_tr.output_path / "genai_perf_report.csv").write_text("") | ||
| assert strategy.get_metric("invalid-metric") == METRIC_ERROR | ||
| # Empty the aiperf CSV to test error handling for the default path | ||
| (ai_dynamo_tr.output_path / "aiperf_report.csv").write_text("") | ||
| assert strategy.get_metric("Inter Token Latency (ms)") == METRIC_ERROR | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add a regression test for unprefixed fallback to genai_perf_report.csv.
The new suite validates the AIPerf default path, but it doesn’t cover legacy runs where only genai_perf_report.csv exists. Add a test that removes/omits aiperf_report.csv and asserts unprefixed metrics still resolve.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/workloads/ai_dynamo/test_report_gen_strategy.py` around lines 123 -
152, Add a regression test to verify unprefixed metric lookup falls back to
genai_perf_report.csv when aiperf_report.csv is missing: create a new test
(parallel to test_ai_dynamo_get_metric_aiperf_defaults) that instantiates
AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr), removes or empties
"aiperf_report.csv" and ensures a "genai_perf_report.csv" with the expected CSV
content is present in ai_dynamo_tr.output_path, then assert
strategy.get_metric("Inter Token Latency (ms)") and strategy.get_metric("Output
Token Throughput (tokens/sec)") return the same numeric values as the defaults
(and that nonexistent metrics still return METRIC_ERROR); use the existing
ai_dynamo_tr fixture and the AIDynamoReportGenerationStrategy.get_metric method
to locate behavior to test.
Add AIPerf as a new workload type alongside GenAIPerf in the AIDynamo workload. Includes the aiperf.sh wrapper script, AIPerf Pydantic model, arg serialization in the command gen strategy, and experimental test/ scenario configs for single-node disaggregated runs.
…ndling - Move aiperf config into vllm.toml (workloads = "aiperf.sh" + [cmd_args.aiperf]) so the backend and benchmark live in one file; remove standalone aiperf.toml and aiperf_slurm_single_node.toml - Fix aiperf.sh: convert extra_args from string to array to prevent word-splitting/glob expansion; fix unknown --* flag handling to shift 1 for boolean flags instead of always shift 2
…t scenario file Fix stale test_name reference in vllm_slurm.toml (vLLM-Qwen3-0.6B → vLLM) and remove vllm_slurm_single_node.toml since vllm_slurm.toml already covers the single-node case.
- Add aiperf config to sglang.toml (workloads = "aiperf.sh") so both vLLM and sglang backends use aiperf as the default benchmark - Fix sglang_slurm.toml: correct stale test_name and add job_status_check=false - Fix AIDynamoReportGenerationStrategy default benchmark_name from "genai_perf" to "aiperf" to match the new default workload; genai_perf metrics still accessible via "genai_perf:metric_name:metric_type" format - Update unit tests: fix existing genai_perf tests to use explicit prefix, add aiperf-specific tests covering per-request metrics, summary metrics, and default benchmark resolution
…strategy - Replace hardcoded benchmark_name default in get_metric() with dynamic derivation from cmd_args.workloads_list (e.g. "aiperf.sh" -> "aiperf"); explicit benchmark:metric:type format still takes precedence - Add aiperf config to sglang.toml and fix stale test_name in sglang_slurm.toml - Update unit tests: add ai_dynamo_aiperf_tr fixture, separate genai_perf and aiperf metric tests, fix E501 in aiperf CSV fixture data
62785b2 to
d5d556e
Compare
…d sglang - Replace genai-perf references with aiperf as the default benchmark tool - Add "Choosing a Benchmark Tool" section explaining the workloads field and how to switch to genai_perf.sh with a TOML snippet - Update result CSV example to include TTFT, TTST, Request Latency, and throughput metrics from an actual GB200 run - Add "Supported Backends" section listing vLLM and sglang
Summary
Add AIPerf as a new workload type alongside GenAIPerf in the AIDynamo workload. Includes the aiperf.sh wrapper script, AIPerf Pydantic model, arg serialization in the command gen strategy, and experimental test/ scenario configs for single-node disaggregated runs.
Test Plan
Unit tests
--- End-to-end on Lyris GB200 Cluster: lyris_gb200 (NVIDIA GB200), Scheduler: Slurm, Partition: gb200 Model: Qwen/Qwen3-0.6B, Workload: aiperf.sh (concurrency=2, request-count=50, ISL=300, OSL=500) vLLM — single-node and multi-node cd ~/cloudai && uv run cloudai run \ --system-config conf/experimental/ai_dynamo/system/lyris_gb200.toml \ --tests-dir conf/experimental/ai_dynamo/test \ --test-scenario conf/experimental/ai_dynamo/test_scenario/vllm_slurm.toml [INFO] System Name: lyris_gb200 [INFO] Scheduler: slurm [INFO] Test Scenario Name: dynamo-vllm-slurm Section Name: test.disagg.single-node Test Name: vLLM No dependencies Section Name: test.disagg.multinode Test Name: vLLM No dependencies [INFO] Scenario results will be stored at: results/dynamo-vllm-slurm_2026-05-21_16-02-05 [INFO] Starting test: test.disagg.single-node (results at: results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.single-node/0) [INFO] Starting test: test.disagg.multinode (results at: results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.multinode/0) [INFO] Submitted slurm job: 1851252 [INFO] Submitted slurm job: 1851253 [INFO] Result file (/home/spothula/cloudai/results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.single-node/0/aiperf_report.csv) exists for aiperf.sh [INFO] Job completed: test.disagg.single-node (iteration 1 of 1) [INFO] Result file (/home/spothula/cloudai/results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.multinode/0/aiperf_report.csv) exists for aiperf.sh [INFO] Job completed: test.disagg.multinode (iteration 1 of 1) ╔═════════════════════════╤════════╤═══════════════════════════════════════════╗ ║ Case │ Status │ Details ║ ╟─────────────────────────┼────────┼───────────────────────────────────────────╢ ║ test.disagg.single-node │ PASSED │ results/dynamo-vllm-slurm_2026-05-21_16- ║ ║ │ │ 02-05/test.disagg.single-node/0 ║ ╟─────────────────────────┼────────┼───────────────────────────────────────────╢ ║ test.disagg.multinode │ PASSED │ results/dynamo-vllm-slurm_2026-05-21_16- ║ ║ │ │ 02-05/test.disagg.multinode/0 ║ ╚═════════════════════════╧════════╧═══════════════════════════════════════════╝ [INFO] All jobs are complete. results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.single-node/0/aiperf_report.csv (key rows): Inter Token Latency (ms),2.81,...,p50=2.83,... Output Token Throughput (tokens/sec),598.78 Total Token Throughput (tokens/sec),962.32 results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.multinode/0/aiperf_report.csv (key rows): Inter Token Latency (ms),2.62,...,p50=2.63,... Output Token Throughput (tokens/sec),573.18 Total Token Throughput (tokens/sec),920.83 --- sglang — single-node cd ~/cloudai && uv run cloudai run \ --system-config conf/experimental/ai_dynamo/system/lyris_gb200.toml \ --tests-dir conf/experimental/ai_dynamo/test \ --test-scenario conf/experimental/ai_dynamo/test_scenario/sglang_slurm.toml [INFO] System Name: lyris_gb200 [INFO] Scheduler: slurm [INFO] Test Scenario Name: dynamo_sglang Section Name: sglang-Qwen3-0.6B Test Name: sglang No dependencies [INFO] Scenario results will be stored at: results/dynamo_sglang_2026-05-21_16-04-54 [INFO] Starting test: sglang-Qwen3-0.6B (results at: results/dynamo_sglang_2026-05-21_16-04-54/sglang-Qwen3-0.6B/0) [INFO] Submitted slurm job: 1851259 [INFO] Result file (/home/spothula/cloudai/results/dynamo_sglang_2026-05-21_16-04-54/sglang-Qwen3-0.6B/0/aiperf_report.csv) exists for aiperf.sh [INFO] Job completed: sglang-Qwen3-0.6B (iteration 1 of 1) ╔═══════════════════╤════════╤═════════════════════════════════════════════════╗ ║ Case │ Status │ Details ║ ╟───────────────────┼────────┼─────────────────────────────────────────────────╢ ║ sglang-Qwen3-0.6B │ PASSED │ results/dynamo_sglang_2026-05-21_16-04-54/ ║ ║ │ │ sglang-Qwen3-0.6B/0 ║ ╚═══════════════════╧════════╧═════════════════════════════════════════════════╝ [INFO] All jobs are complete. results/dynamo_sglang_2026-05-21_16-04-54/sglang-Qwen3-0.6B/0/aiperf_report.csv (key rows): Inter Token Latency (ms),1.75,...,p50=1.74,... Output Token Throughput (tokens/sec),703.01 Total Token Throughput (tokens/sec),1129.32