Skip to content

[sglang_disagg] mori IO optimization and proxy server relocation#161

Merged
gargrahul merged 4 commits into
ROCm:developfrom
basemam:develop-basem-2-mori-optimize-pr
Jun 2, 2026
Merged

[sglang_disagg] mori IO optimization and proxy server relocation#161
gargrahul merged 4 commits into
ROCm:developfrom
basemam:develop-basem-2-mori-optimize-pr

Conversation

@basemam
Copy link
Copy Markdown
Contributor

@basemam basemam commented Jun 2, 2026

  • Unified launcher with MoRI/Mooncake backend selection (KV_TRANSFER_BACKEND)
  • CX7 multi-rail NIC support, default to CX7 400G rail NICs
  • xP/yD multi-node support for DP_MODE=0 (TP-only)
  • DP_MODE=1 is validated for 1P1D
  • Condensed RDMA/NCCL/Gloo env config in mori_ep_env.sh
  • Model flag catalog cleanup in models.yaml

Motivation

Initial motivation is run proxy on prefill node. But other fixes are done.

Technical Details

improved mori-io performance

Test Plan

validated 1p1d with all supported model and run setip mori/without mori. and mori ep.

Test Result

Submission Checklist

Copy link
Copy Markdown
Contributor

@lcskrishna lcskrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

Comment thread scripts/sglang_disagg/models.yaml
Comment thread scripts/sglang_disagg/models.yaml Outdated
- Unified launcher with MoRI/Mooncake backend selection (KV_TRANSFER_BACKEND)
- CX7 multi-rail NIC support, default to CX7 400G rail NICs
- xP/yD multi-node support for DP_MODE=0 (TP-only)
- DP_MODE=1 restricted to 1P1D (multi-node DP not yet supported)
- Condensed RDMA/NCCL/Gloo env config in mori_ep_env.sh
- Model flag catalog cleanup: dp settings only on DeepSeek-V3/R1
- Configurable benchmark combinations with random-range-ratio=1.0
- Dockerfile updated to rocm720 base image

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@basemam basemam force-pushed the develop-basem-2-mori-optimize-pr branch from 7d31b82 to e971f22 Compare June 2, 2026 05:06
Basem Barakat and others added 3 commits June 2, 2026 08:48
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@lcskrishna lcskrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@basemam
Copy link
Copy Markdown
Contributor Author

basemam commented Jun 2, 2026

@lcskrishna all requested changes are done.

@gargrahul gargrahul changed the title [sglang_disagg] Sync MoRI IO optimization from MAD-private PR #276 [sglang_disagg] mori IO optimization and proxy server relocation Jun 2, 2026
@gargrahul gargrahul merged commit f7c8fa6 into ROCm:develop Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants