Skip to content

feat(ai_dynamo): add aiperf workload support#898

Open
saivishal1999 wants to merge 6 commits into
NVIDIA:mainfrom
saivishal1999:spothula/aiperf-workload
Open

feat(ai_dynamo): add aiperf workload support#898
saivishal1999 wants to merge 6 commits into
NVIDIA:mainfrom
saivishal1999:spothula/aiperf-workload

Conversation

@saivishal1999
Copy link
Copy Markdown
Contributor

@saivishal1999 saivishal1999 commented May 21, 2026

Summary

Add AIPerf as a new workload type alongside GenAIPerf in the AIDynamo workload. Includes the aiperf.sh wrapper script, AIPerf Pydantic model, arg serialization in the command gen strategy, and experimental test/ scenario configs for single-node disaggregated runs.

Test Plan

Unit tests

---                                                          
End-to-end on Lyris GB200                                                                                                                          
                                                                                                                                                   
Cluster: lyris_gb200 (NVIDIA GB200), Scheduler: Slurm, Partition: gb200
Model: Qwen/Qwen3-0.6B, Workload: aiperf.sh (concurrency=2, request-count=50, ISL=300, OSL=500)                                                    
                                                                                                                                                   
vLLM — single-node and multi-node                                                                                                                  
                                                                                                                                                   
cd ~/cloudai && uv run cloudai run \                         
  --system-config conf/experimental/ai_dynamo/system/lyris_gb200.toml \
  --tests-dir conf/experimental/ai_dynamo/test \                                                                                                   
  --test-scenario conf/experimental/ai_dynamo/test_scenario/vllm_slurm.toml
                                                                                                                                                   
[INFO] System Name: lyris_gb200                              
[INFO] Scheduler: slurm                                                                                                                            
[INFO] Test Scenario Name: dynamo-vllm-slurm
                                                                                                                                                   
Section Name: test.disagg.single-node                        
  Test Name: vLLM
  No dependencies                                                                                                                                  
Section Name: test.disagg.multinode
  Test Name: vLLM                                                                                                                                  
  No dependencies                                            

[INFO] Scenario results will be stored at: results/dynamo-vllm-slurm_2026-05-21_16-02-05                                                           
[INFO] Starting test: test.disagg.single-node (results at: results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.single-node/0)
[INFO] Starting test: test.disagg.multinode   (results at: results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.multinode/0)                  
[INFO] Submitted slurm job: 1851252                                                                                                                
[INFO] Submitted slurm job: 1851253                                                                                                                
[INFO] Result file (/home/spothula/cloudai/results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.single-node/0/aiperf_report.csv) exists for   
aiperf.sh                                                                                                                                          
[INFO] Job completed: test.disagg.single-node (iteration 1 of 1)
[INFO] Result file (/home/spothula/cloudai/results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.multinode/0/aiperf_report.csv) exists for     
aiperf.sh                                                                                                                                          
[INFO] Job completed: test.disagg.multinode (iteration 1 of 1)
                                                                                                                                                   
╔═════════════════════════╤════════╤═══════════════════════════════════════════╗                                                                   
║ Case                    │ Status │ Details                                   ║
╟─────────────────────────┼────────┼───────────────────────────────────────────╢                                                                   
║ test.disagg.single-node │ PASSED │ results/dynamo-vllm-slurm_2026-05-21_16-  ║                                                                   
║                         │        │ 02-05/test.disagg.single-node/0           ║
╟─────────────────────────┼────────┼───────────────────────────────────────────╢                                                                   
║ test.disagg.multinode   │ PASSED │ results/dynamo-vllm-slurm_2026-05-21_16-  ║
║                         │        │ 02-05/test.disagg.multinode/0             ║                                                                   
╚═════════════════════════╧════════╧═══════════════════════════════════════════╝
                                                                                                                                                   
[INFO] All jobs are complete.                                                                                                                      
                                                                                                                                                   
results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.single-node/0/aiperf_report.csv (key rows):                                              
Inter Token Latency (ms),2.81,...,p50=2.83,...                                                                                                     
Output Token Throughput (tokens/sec),598.78   
Total Token Throughput (tokens/sec),962.32                                                                                                         
                                                             
results/dynamo-vllm-slurm_2026-05-21_16-02-05/test.disagg.multinode/0/aiperf_report.csv (key rows):
Inter Token Latency (ms),2.62,...,p50=2.63,...                                                                                                     
Output Token Throughput (tokens/sec),573.18
Total Token Throughput (tokens/sec),920.83                                                                                                         
                                                             
---                                                                                                                                                
sglang — single-node                                         
                                                                                                                                                   
cd ~/cloudai && uv run cloudai run \
  --system-config conf/experimental/ai_dynamo/system/lyris_gb200.toml \                                                                            
  --tests-dir conf/experimental/ai_dynamo/test \             
  --test-scenario conf/experimental/ai_dynamo/test_scenario/sglang_slurm.toml                                                                      
 
[INFO] System Name: lyris_gb200                                                                                                                    
[INFO] Scheduler: slurm                                      
[INFO] Test Scenario Name: dynamo_sglang                                                                                                           
 
Section Name: sglang-Qwen3-0.6B                                                                                                                    
  Test Name: sglang                                          
  No dependencies

[INFO] Scenario results will be stored at: results/dynamo_sglang_2026-05-21_16-04-54                                                               
[INFO] Starting test: sglang-Qwen3-0.6B (results at: results/dynamo_sglang_2026-05-21_16-04-54/sglang-Qwen3-0.6B/0)
[INFO] Submitted slurm job: 1851259                                                                                                                
[INFO] Result file (/home/spothula/cloudai/results/dynamo_sglang_2026-05-21_16-04-54/sglang-Qwen3-0.6B/0/aiperf_report.csv) exists for aiperf.sh
[INFO] Job completed: sglang-Qwen3-0.6B (iteration 1 of 1)                                                                                         
                                                                                                                                                   
╔═══════════════════╤════════╤═════════════════════════════════════════════════╗                                                                   
║ Case              │ Status │ Details                                         ║                                                                   
╟───────────────────┼────────┼─────────────────────────────────────────────────╢                                                                   
║ sglang-Qwen3-0.6B │ PASSED │ results/dynamo_sglang_2026-05-21_16-04-54/      ║
║                   │        │ sglang-Qwen3-0.6B/0                             ║                                                                   
╚═══════════════════╧════════╧═════════════════════════════════════════════════╝
                                                                                                                                                   
[INFO] All jobs are complete.                                
                                                                                                                                                   
results/dynamo_sglang_2026-05-21_16-04-54/sglang-Qwen3-0.6B/0/aiperf_report.csv (key rows):
Inter Token Latency (ms),1.75,...,p50=1.74,...                                                                                                     
Output Token Throughput (tokens/sec),703.01   
Total Token Throughput (tokens/sec),1129.32                                                                                                        

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds AIPerf as a new AIDynamo workload: Python model and export, Bash wrapper to run aiperf and collect CSV results, shell CLI dispatch and Slurm emission, test config updates, and report selection/tests updated.

Changes

AIPerf Workload Integration

Layer / File(s) Summary
AIPerf workload model and config integration
src/cloudai/workloads/ai_dynamo/ai_dynamo.py, src/cloudai/workloads/ai_dynamo/__init__.py
Defines AIPerf(Workload) with default script/command/report, adds aiperf to AIDynamoCmdArgs and validator, includes aiperf installables in the test definition and workload map, and exports AIPerf.
AIPerf wrapper script
src/cloudai/workloads/ai_dynamo/aiperf.sh
Implements Bash wrapper parsing wrapper flags and post--- profile args, executes aiperf with constructed arguments plus --extra-args, and locates/copies the produced CSV to the result directory.
AI Dynamo shell orchestration and dispatch
src/cloudai/workloads/ai_dynamo/ai_dynamo.sh, src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py, tests/ref_data/ai-dynamo.sbatch
Adds aiperf_args and aiperf_config arrays; extends CLI parsing/dump to handle --aiperf-*/--aiperf-args-*; adds _is_aiperf_workload() selector and launches AIPerf; emits nested --aiperf- args and updates reference sbatch invocation.
Test scenario and benchmark configurations
conf/experimental/ai_dynamo/test/*.toml, conf/experimental/ai_dynamo/test_scenario/*.toml
Switches workloads to aiperf.sh in test configs, adds [cmd_args.aiperf] tuning parameters, and updates test scenario names and job flags.
Report selection and tests
src/cloudai/workloads/ai_dynamo/report_generation_strategy.py, tests/workloads/ai_dynamo/test_report_gen_strategy.py
Changes default benchmark-name derivation for report lookup to use the first configured workload (fallback aiperf), and updates tests to supply aiperf_report.csv fixtures and assert metric extraction for default and explicit aiperf/genai_perf prefixes and error cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • srivatsankrishnan
  • podkidyshev
  • jeffnvidia

Poem

🐇 I hopped through args and CSV trails,
I stitched aiperf into Dynamo's rails,
Bash parsed flags, then ran the run,
The first CSV copied—benchmark fun!
🥕

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(ai_dynamo): add aiperf workload support' directly and clearly summarizes the main change: adding AIPerf as a new workload type to the AI Dynamo workload module.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the feature being added: AIPerf as a new workload type alongside GenAIPerf in the AIDynamo workload, with specific details about aiperf.sh wrapper script, Pydantic model, command generation, and test configs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/workloads/ai_dynamo/aiperf.sh`:
- Line 136: The unquoted ${extra_args} causes word-splitting and glob expansion;
change extra_args from a plain string to an array (e.g., declare -a extra_args
or use read -a / set -- to populate it) and expand it as a quoted array when
invoking the command: replace the unquoted ${extra_args} usage with
"${extra_args[@]}", ensuring extra_args is initialized (empty array if none) and
any code that appends flags uses array assignments (extra_args+=(--flag
"value")) so values containing spaces or asterisks are preserved.
- Around line 79-80: The argument parser currently treats any --* option as
taking a value and unconditionally does `shift 2`, which breaks when a boolean
flag appears last; update the `--*)` case in aiperf.sh to check whether the next
positional argument exists and does NOT begin with '-' before consuming two
args: if $2 is non-empty and does not start with '-' then shift 2, otherwise
shift 1 so boolean/standalone flags are handled safely under set -e.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0e29aa77-2919-4737-a92c-3779a9270c4f

📥 Commits

Reviewing files that changed from the base of the PR and between 1099207 and 77b639a.

📒 Files selected for processing (9)
  • conf/experimental/ai_dynamo/test/aiperf.toml
  • conf/experimental/ai_dynamo/test_scenario/aiperf_slurm_single_node.toml
  • conf/experimental/ai_dynamo/test_scenario/vllm_slurm_single_node.toml
  • src/cloudai/workloads/ai_dynamo/__init__.py
  • src/cloudai/workloads/ai_dynamo/ai_dynamo.py
  • src/cloudai/workloads/ai_dynamo/ai_dynamo.sh
  • src/cloudai/workloads/ai_dynamo/aiperf.sh
  • src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py
  • tests/ref_data/ai-dynamo.sbatch

Comment thread src/cloudai/workloads/ai_dynamo/aiperf.sh Outdated
Comment thread src/cloudai/workloads/ai_dynamo/aiperf.sh Outdated
@podkidyshev
Copy link
Copy Markdown
Contributor

@saivishal1999 How did you test it? The Test Plan in PR description is supposed to include information about how you verified the changes in the PR work

@podkidyshev
Copy link
Copy Markdown
Contributor

@saivishal1999 what about metrics (repot generation strategy)? Are the output artifacts of aiperf and genai perf exactly the same or AIDynamoReportGenerationStrategy can handle both?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/workloads/ai_dynamo/report_generation_strategy.py (1)

47-62: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add backward-compatible fallback when defaulting to AIPerf.

Switching the unprefixed default to aiperf can break existing runs that only produce genai_perf_report.csv; those metrics now return METRIC_ERROR. Add a fallback to genai_perf_report.csv when unprefixed lookup can’t find aiperf output.

Suggested patch
 def get_metric(self, metric: str) -> MetricValue:
     logging.info(f"Getting metric: {metric}")
     benchmark_name = "aiperf"
+    metric_has_prefix = ":" in metric
     metric_name = metric
     metric_type = "avg"

-    if ":" in metric:
+    if metric_has_prefix:
         parts = metric.split(":", maxsplit=2)
         if len(parts) != 3:
             logging.warning(f"Invalid metric format: {metric}. Expected 'benchmark:metric_name:metric_type'")
             return METRIC_ERROR
         benchmark_name, metric_name, metric_type = parts

     source_csv = self.test_run.output_path / f"{benchmark_name}_report.csv"
     logging.info(f"CSV file: {source_csv}")
     if not source_csv.exists() or source_csv.stat().st_size == 0:
+        if not metric_has_prefix and benchmark_name == "aiperf":
+            fallback_csv = self.test_run.output_path / "genai_perf_report.csv"
+            if fallback_csv.exists() and fallback_csv.stat().st_size > 0:
+                return self.extract_metric_from_csv(fallback_csv, metric_name, metric_type)
         logging.info(f"CSV file: {source_csv} does not exist or is empty")
         return METRIC_ERROR
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/workloads/ai_dynamo/test_report_gen_strategy.py`:
- Around line 123-152: Add a regression test to verify unprefixed metric lookup
falls back to genai_perf_report.csv when aiperf_report.csv is missing: create a
new test (parallel to test_ai_dynamo_get_metric_aiperf_defaults) that
instantiates AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr),
removes or empties "aiperf_report.csv" and ensures a "genai_perf_report.csv"
with the expected CSV content is present in ai_dynamo_tr.output_path, then
assert strategy.get_metric("Inter Token Latency (ms)") and
strategy.get_metric("Output Token Throughput (tokens/sec)") return the same
numeric values as the defaults (and that nonexistent metrics still return
METRIC_ERROR); use the existing ai_dynamo_tr fixture and the
AIDynamoReportGenerationStrategy.get_metric method to locate behavior to test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 1e03d231-ddba-48c2-8bc6-0baedcc45690

📥 Commits

Reviewing files that changed from the base of the PR and between 9da57d5 and 9cfee53.

📒 Files selected for processing (4)
  • conf/experimental/ai_dynamo/test/sglang.toml
  • conf/experimental/ai_dynamo/test_scenario/sglang_slurm.toml
  • src/cloudai/workloads/ai_dynamo/report_generation_strategy.py
  • tests/workloads/ai_dynamo/test_report_gen_strategy.py

Comment on lines 123 to 152
def test_ai_dynamo_get_metric_aiperf_defaults(slurm_system: SlurmSystem, ai_dynamo_tr: TestRun) -> None:
strategy = AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr)

# Use exact metric names from CSV (with avg column, which is default)
assert strategy.get_metric("Time To First Token (ms)") == 111.12
assert strategy.get_metric("Time To Second Token (ms)") == 11.13
assert strategy.get_metric("Request Latency (ms)") == 1111.14
assert strategy.get_metric("Inter Token Latency (ms)") == 12.34
# bare metric names default to aiperf_report.csv (avg column)
assert strategy.get_metric("Inter Token Latency (ms)") == 2.83
assert strategy.get_metric("Output Token Throughput (tokens/sec)") == 595.68


def test_ai_dynamo_get_metric_aiperf_explicit(slurm_system: SlurmSystem, ai_dynamo_tr: TestRun) -> None:
strategy = AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr)

# per-request metrics (first CSV section with avg/p50 columns)
assert strategy.get_metric("aiperf:Inter Token Latency (ms):avg") == 2.83
assert strategy.get_metric("aiperf:Inter Token Latency (ms):p50") == 2.83
assert strategy.get_metric("aiperf:Time to First Token (ms):avg") == 49.87

# summary metrics (second CSV section — value lands in "avg" column by position)
assert strategy.get_metric("aiperf:Output Token Throughput (tokens/sec):avg") == 595.68
assert strategy.get_metric("aiperf:Total Token Throughput (tokens/sec):avg") == 954.47


def test_ai_dynamo_get_metric_invalid(slurm_system: SlurmSystem, ai_dynamo_tr: TestRun) -> None:
strategy = AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr)

assert strategy.get_metric("invalid-metric") == METRIC_ERROR
assert strategy.get_metric("nonexistent-metric") == METRIC_ERROR

# Empty the CSV file to test error handling
(ai_dynamo_tr.output_path / "genai_perf_report.csv").write_text("")
assert strategy.get_metric("invalid-metric") == METRIC_ERROR
# Empty the aiperf CSV to test error handling for the default path
(ai_dynamo_tr.output_path / "aiperf_report.csv").write_text("")
assert strategy.get_metric("Inter Token Latency (ms)") == METRIC_ERROR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a regression test for unprefixed fallback to genai_perf_report.csv.

The new suite validates the AIPerf default path, but it doesn’t cover legacy runs where only genai_perf_report.csv exists. Add a test that removes/omits aiperf_report.csv and asserts unprefixed metrics still resolve.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/workloads/ai_dynamo/test_report_gen_strategy.py` around lines 123 -
152, Add a regression test to verify unprefixed metric lookup falls back to
genai_perf_report.csv when aiperf_report.csv is missing: create a new test
(parallel to test_ai_dynamo_get_metric_aiperf_defaults) that instantiates
AIDynamoReportGenerationStrategy(slurm_system, ai_dynamo_tr), removes or empties
"aiperf_report.csv" and ensures a "genai_perf_report.csv" with the expected CSV
content is present in ai_dynamo_tr.output_path, then assert
strategy.get_metric("Inter Token Latency (ms)") and strategy.get_metric("Output
Token Throughput (tokens/sec)") return the same numeric values as the defaults
(and that nonexistent metrics still return METRIC_ERROR); use the existing
ai_dynamo_tr fixture and the AIDynamoReportGenerationStrategy.get_metric method
to locate behavior to test.

Add AIPerf as a new workload type alongside GenAIPerf in the AIDynamo
workload. Includes the aiperf.sh wrapper script, AIPerf Pydantic model,
arg serialization in the command gen strategy, and experimental test/
scenario configs for single-node disaggregated runs.
…ndling

- Move aiperf config into vllm.toml (workloads = "aiperf.sh" + [cmd_args.aiperf])
  so the backend and benchmark live in one file; remove standalone aiperf.toml
  and aiperf_slurm_single_node.toml
- Fix aiperf.sh: convert extra_args from string to array to prevent
  word-splitting/glob expansion; fix unknown --* flag handling to shift 1
  for boolean flags instead of always shift 2
…t scenario file

Fix stale test_name reference in vllm_slurm.toml (vLLM-Qwen3-0.6B → vLLM)
and remove vllm_slurm_single_node.toml since vllm_slurm.toml already covers
the single-node case.
- Add aiperf config to sglang.toml (workloads = "aiperf.sh") so both
  vLLM and sglang backends use aiperf as the default benchmark
- Fix sglang_slurm.toml: correct stale test_name and add job_status_check=false
- Fix AIDynamoReportGenerationStrategy default benchmark_name from
  "genai_perf" to "aiperf" to match the new default workload; genai_perf
  metrics still accessible via "genai_perf:metric_name:metric_type" format
- Update unit tests: fix existing genai_perf tests to use explicit prefix,
  add aiperf-specific tests covering per-request metrics, summary metrics,
  and default benchmark resolution
…strategy

- Replace hardcoded benchmark_name default in get_metric() with dynamic
  derivation from cmd_args.workloads_list (e.g. "aiperf.sh" -> "aiperf");
  explicit benchmark:metric:type format still takes precedence
- Add aiperf config to sglang.toml and fix stale test_name in sglang_slurm.toml
- Update unit tests: add ai_dynamo_aiperf_tr fixture, separate genai_perf
  and aiperf metric tests, fix E501 in aiperf CSV fixture data
@saivishal1999 saivishal1999 force-pushed the spothula/aiperf-workload branch from 62785b2 to d5d556e Compare May 21, 2026 23:40
…d sglang

- Replace genai-perf references with aiperf as the default benchmark tool
- Add "Choosing a Benchmark Tool" section explaining the workloads field
  and how to switch to genai_perf.sh with a TOML snippet
- Update result CSV example to include TTFT, TTST, Request Latency, and
  throughput metrics from an actual GB200 run
- Add "Supported Backends" section listing vLLM and sglang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants