AIDynamo: enable LMCache by podkidyshev · Pull Request #906 · NVIDIA/cloudai

podkidyshev · 2026-05-28T21:30:51Z

Summary

Improve LMCache support by allowing any YAML config to by passed
Added excluded_dse_args to disable sweeping on desired arguments

Test Plan

Automated CI
Manual test runs

Additional Notes

coderabbitai · 2026-05-28T21:31:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5d5ca12c-5020-4360-9c4c-057db75bcbd8

📥 Commits

Reviewing files that changed from the base of the PR and between 0468e65 and 1ded4b5.

📒 Files selected for processing (1)

conf/experimental/ai_dynamo/test/vllm.toml

📝 Walkthrough

Walkthrough

Convert LMCache from embedded models to a template-rendered YAML propagation flow and add dse_excluded_args to exclude dot-path cmd_args from DSE sweeps; update command generation, shell orchestration, configs, docs, and tests accordingly.

Changes

LMCache Configuration and DSE Exclusion Refactoring

Layer / File(s)	Summary
DSE Path Exclusion Infrastructure `src/cloudai/models/workload.py`, `src/cloudai/models/scenario.py`, `src/cloudai/_core/test_scenario.py`	`TestDefinition` gains `dse_excluded_args` with validation and `is_dse_excluded_arg`; `TestRun.param_space` filters list-valued `cmd_args` by exclusions; `apply_params_set` supports mixed dict/attribute dotted-path traversal.
LMCache Model Refactor & Public API `src/cloudai/workloads/ai_dynamo/ai_dynamo.py`, `src/cloudai/workloads/ai_dynamo/__init__.py`, `tests/workloads/ai_dynamo/*`	Remove `LMCacheArgs`/`LMCache`; add `LMCacheController` and LMCache filename constants; `AIDynamoCmdArgs` now accepts `lmcache: dict
LMCache YAML Template Rendering (shell) `src/cloudai/workloads/ai_dynamo/ai_dynamo.sh`	Add `render_lmcache_config()` and `lmcache_storage_cache_dir()`; render LMCache YAML into `LMCACHE_CONFIG_FILE`, adjust `setup_lmcache`/`launch_lmcache_controller`, and add `--lmcache-controller-cmd` CLI parsing.
Slurm Command Generation Strategy `src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py`	Add `_prepare_lmcache_config` to write YAML + backup into `test_run.output_path`; `_gen_script_args` calls it and conditionally appends `--lmcache-controller-cmd`; stop nesting `cmd_args.lmcache` TOML into CLI; refactor to use `self.td`.
Configuration Examples and Documentation `conf/experimental/ai_dynamo/test/*.toml`, `conf/experimental/ai_dynamo/test_scenario/vllm_lmcache.toml`, `doc/USER_GUIDE.rst`, `doc/workloads/ai_dynamo.rst`, `tests/ref_data/ai-dynamo.sbatch`	Add `vllm_lmcache.toml` scenario with LMCache + aiperf examples and `dse_excluded_args`; update `sglang.toml`/`vllm.toml` to new controller section and remove `/tmp` mount; document DSE exclusions and LMCache propagation; remove inline LMCache args from sbatch template.
Test Coverage `tests/test_cloudaigym.py`, `tests/test_test_scenario.py`, `tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py`	Add tests for DSE exclusion prefix parsing/validation, param_space filtering, LMCache YAML generation and backup file writing, `LMCACHE_CONFIG_FILE` propagation, and `--lmcache-controller-cmd` passthrough.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

feature

Suggested reviewers

srivatsankrishnan
jeffnvidia

Poem

🐰 I stitched the cache to YAML thread by thread,
No more embedded nests inside the code's bed.
DSE trims the hops that wander far and wide,
Controllers bind where 0.0.0.0 resides.
A carrot for tests — the config files beside.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'AIDynamo: enable LMCache' is clear and specific, directly reflecting the main objective of enabling/improving LMCache support in the AIDynamo workload, which is the primary focus of the changeset.
Description check	✅ Passed	The description is related to the changeset, mentioning both LMCache YAML config support and the addition of dse_excluded_args feature, which are the core changes made across the codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ipod/dynamo-multiple-perf-2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/workloads/ai_dynamo/ai_dynamo.sh`:
- Around line 967-969: The line that declares and assigns storage_cache_dir
masks the exit status of the subshell call to lmcache_storage_cache_dir; change
the pattern used for storage_cache_dir to declare it first and assign it on the
next line so failures aren’t masked (e.g., keep local frontend_node and
frontend_ip as-is but for lmcache_storage_cache_dir use separate `local
storage_cache_dir` declaration then
`storage_cache_dir="$(lmcache_storage_cache_dir)"`), referencing the symbols
frontend_node, frontend_ip, and lmcache_storage_cache_dir to locate the code.

In `@tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py`:
- Line 267: The assertion is iterating over characters because result is a
string; update the check to tokenize the command string (use shlex.split(result)
or result.split()) and assert against tokens: replace the current
any(arg.startswith("--lmcache") for arg in result) with
any(arg.startswith("--lmcache") for arg in shlex.split(result)) (or
result.split()) so the test inspects arguments/tokens rather than characters.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0a40cf22-4c7f-45fe-b599-32214c7acdb6

📥 Commits

Reviewing files that changed from the base of the PR and between 0738c4b and c1c6aa0.

📒 Files selected for processing (19)

conf/experimental/ai_dynamo/test/sglang.toml
conf/experimental/ai_dynamo/test/vllm.toml
conf/experimental/ai_dynamo/test_scenario/vllm_lmcache.toml
doc/USER_GUIDE.rst
doc/workloads/ai_dynamo.rst
src/cloudai/_core/test_scenario.py
src/cloudai/models/scenario.py
src/cloudai/models/workload.py
src/cloudai/workloads/ai_dynamo/__init__.py
src/cloudai/workloads/ai_dynamo/ai_dynamo.py
src/cloudai/workloads/ai_dynamo/ai_dynamo.sh
src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py
tests/ref_data/ai-dynamo.sbatch
tests/test_acceptance.py
tests/test_cloudaigym.py
tests/test_test_scenario.py
tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py
tests/workloads/ai_dynamo/test_json_gen_strategy_kubernetes.py
tests/workloads/ai_dynamo/test_report_gen_strategy.py

💤 Files with no reviewable changes (3)

tests/workloads/ai_dynamo/test_json_gen_strategy_kubernetes.py
tests/test_acceptance.py
tests/workloads/ai_dynamo/test_report_gen_strategy.py

podkidyshev added 3 commits May 28, 2026 12:34

adding proper lmcache support

98dfa86

lmcache fix

96faa46

prepare pr

a39e3cb

podkidyshev self-assigned this May 28, 2026

podkidyshev added the enhancement New feature or request label May 28, 2026

podkidyshev added 10 commits May 28, 2026 15:24

refactor

e254226

refactor keeps going

389d63d

trying to fully enable lmcache and lmcache controller

b33cdc5

fix slurm job name

9a1be0a

try another lmcache config

5d5115b

bla

f378714

add accuracy test

3248fe1

exclude dse arg feature

e34a775

docs for the dse_excluded_args feature

70cc288

make lmcache vllm conf nicer

c1c6aa0

podkidyshev marked this pull request as ready for review May 29, 2026 17:23

podkidyshev requested review from jeffnvidia and srivatsankrishnan as code owners May 29, 2026 17:23

podkidyshev changed the title ~~AIDynamo: enable LMCache and multi-aiperf runs~~ AIDynamo: enable LMCache May 29, 2026

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread src/cloudai/workloads/ai_dynamo/ai_dynamo.sh

Comment thread tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py Outdated

podkidyshev added 2 commits May 29, 2026 12:00

make aidynamo test more consistent

0468e65

fix vllm config

1ded4b5

amaslenn approved these changes May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIDynamo: enable LMCache#906

AIDynamo: enable LMCache#906
podkidyshev wants to merge 15 commits into
mainfrom
ipod/dynamo-multiple-perf-2

podkidyshev commented May 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

podkidyshev commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

podkidyshev commented May 28, 2026 •

edited

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading