Skip to content

AIDynamo: enable LMCache#906

Open
podkidyshev wants to merge 15 commits into
mainfrom
ipod/dynamo-multiple-perf-2
Open

AIDynamo: enable LMCache#906
podkidyshev wants to merge 15 commits into
mainfrom
ipod/dynamo-multiple-perf-2

Conversation

@podkidyshev
Copy link
Copy Markdown
Contributor

@podkidyshev podkidyshev commented May 28, 2026

Summary

  • Improve LMCache support by allowing any YAML config to by passed
  • Added excluded_dse_args to disable sweeping on desired arguments

Test Plan

  • Automated CI
  • Manual test runs

Additional Notes

@podkidyshev podkidyshev self-assigned this May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5d5ca12c-5020-4360-9c4c-057db75bcbd8

📥 Commits

Reviewing files that changed from the base of the PR and between 0468e65 and 1ded4b5.

📒 Files selected for processing (1)
  • conf/experimental/ai_dynamo/test/vllm.toml

📝 Walkthrough

Walkthrough

Convert LMCache from embedded models to a template-rendered YAML propagation flow and add dse_excluded_args to exclude dot-path cmd_args from DSE sweeps; update command generation, shell orchestration, configs, docs, and tests accordingly.

Changes

LMCache Configuration and DSE Exclusion Refactoring

Layer / File(s) Summary
DSE Path Exclusion Infrastructure
src/cloudai/models/workload.py, src/cloudai/models/scenario.py, src/cloudai/_core/test_scenario.py
TestDefinition gains dse_excluded_args with validation and is_dse_excluded_arg; TestRun.param_space filters list-valued cmd_args by exclusions; apply_params_set supports mixed dict/attribute dotted-path traversal.
LMCache Model Refactor & Public API
src/cloudai/workloads/ai_dynamo/ai_dynamo.py, src/cloudai/workloads/ai_dynamo/__init__.py, tests/workloads/ai_dynamo/*
Remove LMCacheArgs/LMCache; add LMCacheController and LMCache filename constants; AIDynamoCmdArgs now accepts `lmcache: dict
LMCache YAML Template Rendering (shell)
src/cloudai/workloads/ai_dynamo/ai_dynamo.sh
Add render_lmcache_config() and lmcache_storage_cache_dir(); render LMCache YAML into LMCACHE_CONFIG_FILE, adjust setup_lmcache/launch_lmcache_controller, and add --lmcache-controller-cmd CLI parsing.
Slurm Command Generation Strategy
src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py
Add _prepare_lmcache_config to write YAML + backup into test_run.output_path; _gen_script_args calls it and conditionally appends --lmcache-controller-cmd; stop nesting cmd_args.lmcache TOML into CLI; refactor to use self.td.
Configuration Examples and Documentation
conf/experimental/ai_dynamo/test/*.toml, conf/experimental/ai_dynamo/test_scenario/vllm_lmcache.toml, doc/USER_GUIDE.rst, doc/workloads/ai_dynamo.rst, tests/ref_data/ai-dynamo.sbatch
Add vllm_lmcache.toml scenario with LMCache + aiperf examples and dse_excluded_args; update sglang.toml/vllm.toml to new controller section and remove /tmp mount; document DSE exclusions and LMCache propagation; remove inline LMCache args from sbatch template.
Test Coverage
tests/test_cloudaigym.py, tests/test_test_scenario.py, tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py
Add tests for DSE exclusion prefix parsing/validation, param_space filtering, LMCache YAML generation and backup file writing, LMCACHE_CONFIG_FILE propagation, and --lmcache-controller-cmd passthrough.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

feature

Suggested reviewers

  • srivatsankrishnan
  • jeffnvidia

Poem

🐰 I stitched the cache to YAML thread by thread,
No more embedded nests inside the code's bed.
DSE trims the hops that wander far and wide,
Controllers bind where 0.0.0.0 resides.
A carrot for tests — the config files beside.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'AIDynamo: enable LMCache' is clear and specific, directly reflecting the main objective of enabling/improving LMCache support in the AIDynamo workload, which is the primary focus of the changeset.
Description check ✅ Passed The description is related to the changeset, mentioning both LMCache YAML config support and the addition of dse_excluded_args feature, which are the core changes made across the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ipod/dynamo-multiple-perf-2

Comment @coderabbitai help to get the list of available commands and usage tips.

@podkidyshev podkidyshev added the enhancement New feature or request label May 28, 2026
@podkidyshev podkidyshev marked this pull request as ready for review May 29, 2026 17:23
@podkidyshev podkidyshev changed the title AIDynamo: enable LMCache and multi-aiperf runs AIDynamo: enable LMCache May 29, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/workloads/ai_dynamo/ai_dynamo.sh`:
- Around line 967-969: The line that declares and assigns storage_cache_dir
masks the exit status of the subshell call to lmcache_storage_cache_dir; change
the pattern used for storage_cache_dir to declare it first and assign it on the
next line so failures aren’t masked (e.g., keep local frontend_node and
frontend_ip as-is but for lmcache_storage_cache_dir use separate `local
storage_cache_dir` declaration then
`storage_cache_dir="$(lmcache_storage_cache_dir)"`), referencing the symbols
frontend_node, frontend_ip, and lmcache_storage_cache_dir to locate the code.

In `@tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py`:
- Line 267: The assertion is iterating over characters because result is a
string; update the check to tokenize the command string (use shlex.split(result)
or result.split()) and assert against tokens: replace the current
any(arg.startswith("--lmcache") for arg in result) with
any(arg.startswith("--lmcache") for arg in shlex.split(result)) (or
result.split()) so the test inspects arguments/tokens rather than characters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0a40cf22-4c7f-45fe-b599-32214c7acdb6

📥 Commits

Reviewing files that changed from the base of the PR and between 0738c4b and c1c6aa0.

📒 Files selected for processing (19)
  • conf/experimental/ai_dynamo/test/sglang.toml
  • conf/experimental/ai_dynamo/test/vllm.toml
  • conf/experimental/ai_dynamo/test_scenario/vllm_lmcache.toml
  • doc/USER_GUIDE.rst
  • doc/workloads/ai_dynamo.rst
  • src/cloudai/_core/test_scenario.py
  • src/cloudai/models/scenario.py
  • src/cloudai/models/workload.py
  • src/cloudai/workloads/ai_dynamo/__init__.py
  • src/cloudai/workloads/ai_dynamo/ai_dynamo.py
  • src/cloudai/workloads/ai_dynamo/ai_dynamo.sh
  • src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py
  • tests/ref_data/ai-dynamo.sbatch
  • tests/test_acceptance.py
  • tests/test_cloudaigym.py
  • tests/test_test_scenario.py
  • tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py
  • tests/workloads/ai_dynamo/test_json_gen_strategy_kubernetes.py
  • tests/workloads/ai_dynamo/test_report_gen_strategy.py
💤 Files with no reviewable changes (3)
  • tests/workloads/ai_dynamo/test_json_gen_strategy_kubernetes.py
  • tests/test_acceptance.py
  • tests/workloads/ai_dynamo/test_report_gen_strategy.py

Comment thread src/cloudai/workloads/ai_dynamo/ai_dynamo.sh
Comment thread tests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants