Skip to content

[None][refactor] MoEScheduler split + MegaMoE EPLB / multi-chunk / CI integration#13908

Open
xxi-nv wants to merge 2 commits intoNVIDIA:mainfrom
xxi-nv:feat_refactor_megamoe_deepgemm
Open

[None][refactor] MoEScheduler split + MegaMoE EPLB / multi-chunk / CI integration#13908
xxi-nv wants to merge 2 commits intoNVIDIA:mainfrom
xxi-nv:feat_refactor_megamoe_deepgemm

Conversation

@xxi-nv
Copy link
Copy Markdown
Collaborator

@xxi-nv xxi-nv commented May 8, 2026

Summary

  • Split ConfigurableMoE.forward into a dedicated MoEScheduler with ExternalComm and FusedComm kinds; ConfigurableMoE becomes a thin wrapper that owns DWDP recording and EPLB repeat_idx rotation.
  • Refactor MegaMoEDeepGemm from a standalone backend (mega_moe/backend.py, deleted) into a ConfigurableMoE quant-method backend (mega_moe/mega_moe_deepgemm.py), reusing _BACKEND_SYNC_ATTRS for layer / EPLB state mirroring.
  • Add EPLB support to MegaMoE (incl. dynamic EPLB), multi-chunk execution path, and CI coverage on DGX_B200 / DGX_B300.
  • Defer DG NVLink SymmBuffer allocation from backend __init__ to create_weights, so the buffer is sized after ConfigurableMoE syncs the real num_slots / expert_size_per_partition instead of the placeholder num_slots = num_experts seeded under init_load_balancer=False. This keeps sym_buffer.num_experts == num_experts_per_rank * num_ranks under EPLB and matches the DeepGEMM mega.hpp host assertion contract.
  • Add EPLB unit test (test_configurable_moe_multi_gpu_eplb).

Test plan

  • pytest tests/unittest/_torch/modules/moe/test_moe_module.py -k \"MEGAMOE_DEEPGEMM and multi_gpu\" -vs on GB200 / 4-GPU (DEP):
    • 3 PASSED (multi_gpu[e8 Renormalize], multi_gpu[e8 DeepSeekV3], multi_gpu_eplb[e8_k2 slots=16 dynamic]), 2 SKIPPED (e256 OOM threshold), 0 FAILED, no mega.hpp assertion.
  • /bot run to exercise the rest of the MoE / MegaMoE CI matrix.

Summary by CodeRabbit

  • New Features

    • Introduced MegaMoE DeepGEMM as a first-class MoE backend with native fused kernel support and W4A8_MXFP4_MXFP8 quantization.
    • Added scheduler abstraction for MoE forward execution supporting both external and fused communication strategies.
  • Improvements

    • Enhanced NVLink workspace reuse across MoE instances via per-layout caching and lifecycle management.
    • Refactored ConfigurableMoE architecture for cleaner separation of concerns between backend selection and execution scheduling.
  • Documentation

    • Updated MoE developer guide with new ConfigurableMoE orchestration model and scheduler classification framework.

@xxi-nv xxi-nv requested a review from a team as a code owner May 8, 2026 13:01
@xxi-nv xxi-nv requested a review from yuxianq May 8, 2026 13:01
@xxi-nv xxi-nv requested review from Barry-Delaney, QiJune, leslie-fang25 and lfr-0531 and removed request for yuxianq May 8, 2026 13:06
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors TensorRT-LLM's MoE implementation from a monolithic ConfigurableMoE design to a modular scheduler-driven architecture. It introduces the MegaMoE DeepGEMM backend as a first-class fused-communication MoE implementation, adds workspace lifecycle management with reference counting for NVLink AllToAll operations, and includes comprehensive test infrastructure to validate the new design across multiple hardware configurations and EPLB scenarios.

Changes

MoE Scheduler Architecture Refactoring

Layer / File(s) Summary
Interface & Scheduler Contracts
tensorrt_llm/_torch/modules/fused_moe/interface.py
Introduces MoESchedulerKind enum with EXTERNAL_COMM and FUSED_COMM values, adds scheduler_kind class attribute to MoE base class, and defines validate_configurable_moe backend validation hook.
Scheduler Abstraction & Implementations
tensorrt_llm/_torch/modules/fused_moe/moe_scheduler.py
New module implementing MoEScheduler abstract base class, ExternalCommMoEScheduler for host-orchestrated execution with multi-chunk support and EPLB hooks, FusedCommMoEScheduler for in-kernel fused execution, and create_moe_scheduler factory function.
NVLink Workspace Lifecycle
tensorrt_llm/_torch/modules/fused_moe/communication/nvlink_one_sided.py
Adds process-wide workspace caching with per-key reference counting, deferred allocation, equality validation, and explicit destroy() method for lifecycle management.
ConfigurableMoE Orchestration Refactoring
tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py
Delegates execution to schedulers, extracts backend creation into _create_and_sync_backend with deferred weight loading, simplifies forward_impl to thin scheduler wrapper, updates communication strategy auto-creation for FUSED_COMM backends, and delegates backend validation.
Architecture Documentation
tensorrt_llm/_torch/modules/fused_moe/MOE_DEVELOPER_GUIDE.md
Documents ConfigurableMoE + Backend + Scheduler architecture, formalizes scheduler selection via MoESchedulerKind, describes distinct execution flows for external vs fused communication, updates scheduler/EPLB constraints, extends file map and backend registry, and clarifies anti-patterns.
Build Configuration
cpp/tensorrt_llm/deep_gemm/CMakeLists.txt
Sets DG_USE_PYTORCH_CUBLASLT_HANDLE environment variable to "1" in generated __init__.py.

MegaMoE DeepGEMM Backend & Test Coverage

Layer / File(s) Summary
MegaMoE DeepGEMM Backend
tensorrt_llm/_torch/modules/fused_moe/mega_moe/mega_moe_deepgemm.py
New backend implementing SM100/BF16/W4A8_MXFP4_MXFP8 capability gating, EP process-group resolution, SymmBuffer allocation/caching, per-token FP8 quantization with TRT-LLM op or torch.compile paths, slot/expert divisibility validation, and fused kernel dispatch.
MegaMoE Weight Quantization
tensorrt_llm/_torch/modules/fused_moe/quantization.py
Adds W4A8MXFP4MXFP8MegaMoEDeepGemmMethod to manage weight allocation, loading for vanilla/fused modes with optional EPLB staging, transformation to DG-native layouts, and dynamic EPLB weight preparation.
Backend Selection & Module Updates
tensorrt_llm/_torch/modules/fused_moe/create_moe.py, tensorrt_llm/_torch/modules/fused_moe/mega_moe/__init__.py
Updates backend selection to use MegaMoEDeepGemm.can_implement() for capability gating, includes MegaMoEDeepGemm in load-balancer-eligible backends, routes ConfigurableMoE construction to new backend, exports new classes, and removes old MegaMoEDeepGemmFusedMoE export.
Deleted Legacy Backend
tensorrt_llm/_torch/modules/fused_moe/mega_moe/backend.py
Removes the entire old MegaMoEDeepGemmFusedMoE backend implementation.
Test Utilities
tests/unittest/_torch/modules/moe/moe_test_utils.py
Extends MoeBackendType with MEGAMOE, adds should_skip_megamoe() with SM/dtype/alignment validation, updates skip logic and autotuner checks to include MegaMoE constraints.
Test Reference Implementations
tests/unittest/_torch/modules/moe/quantize_utils.py
Adds MXFP4MXFP8RefMegaMoEDeepGemm reference module and updates prepare_weights_from_backend() to compute TP-dependent quantization parameters.
Backend Unit Tests
tests/unittest/_torch/modules/moe/test_moe_backend.py
Adds _ensure_single_proc_dist_for_megamoe() distributed initialization helper, validation tests for slot/expert divisibility, MegaMoE kwargs mapping, and model configurations.
Module Multi-GPU & EPLB Tests
tests/unittest/_torch/modules/moe/test_moe_module.py
Adds TCP port selection, distributed initialization for MegaMoE, backend-driven weight preparation with EPLB staging, dedicated multi-GPU and EPLB parameter generators, and MegaMoE-specific skip constraints.
Integration Test Configuration
tests/integration/test_lists/test-db/l0_dgx_b200.yml, l0_dgx_b300.yml
Splits DEEPGEMM tests and adds new MEGAMOE_DEEPGEMM test cases to the integration test matrix.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main changes: refactoring MoEScheduler split and adding MegaMoE EPLB/multi-chunk/CI integration. It is specific, concise, and directly relevant to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description comprehensively documents refactoring objectives, test coverage, and technical rationale for MoEScheduler split and MegaMoE backend integration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/modules/fused_moe/create_moe.py (1)

1-21: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the required NVIDIA SPDX header to this modified module.

This file now has Python source changes but still lacks the repository-required copyright/license header at the top.

As per coding guidelines "All C++, Python, and other source files must contain NVIDIA copyright header with current modification year".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py` around lines 1 - 21,
This module (create_moe.py) is missing the required NVIDIA SPDX
copyright/license header; add the repository-standard NVIDIA SPDX header
(including the current modification year) as the very first lines of the file
above all imports and definitions so the file that defines symbols like
ConfigurableMoE, CuteDslFusedMoE, CutlassFusedMoE, DeepGemmFusedMoE,
DenseGEMMFusedMoE, TritonFusedMoE, TRTLLMGenFusedMoE, VanillaMoE, WideEPMoE,
MoE, MoEWeightLoadingMode, and MegaMoEDeepGemm contains the mandated header.
Ensure the header text exactly matches the project's required NVIDIA SPDX format
and include the current year.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/modules/fused_moe/mega_moe/mega_moe_deepgemm.py`:
- Around line 192-199: The constructor for MegaMoEDeepGEMM currently ignores the
activation_type parameter and always sets self.activation = "swiglu"; update the
constructor (and the other occurrence around lines 267-269) to map and validate
the ActivationType enum to the expected activation string (e.g.,
ActivationType.Swiglu -> "swiglu", ActivationType.GELU -> "gelu", etc.) and set
self.activation from activation_type instead of the hardcoded default; if an
unsupported ActivationType is passed, raise a ValueError or fallback explicitly
with a clear log message so callers of create_moe_backend() get the correct
fused activation behavior.
- Around line 125-173: can_implement() can return True in single-process builds
even though MegaMoEDeepGemm's constructor and _resolve_ep_pg() require
torch.distributed to be initialized; update can_implement() in class
MegaMoEDeepGemm to reject non-distributed runs by checking
torch.distributed.is_available() and torch.distributed.is_initialized() (use
both for safety) and return False with a clear message (e.g. "requires
torch.distributed to be available and initialized") so create_moe.get_moe_cls()
will fall back cleanly to other implementations like CutlassFusedMoE.

In `@tensorrt_llm/_torch/modules/fused_moe/MOE_DEVELOPER_GUIDE.md`:
- Around line 62-67: Update the documentation to match the actual
implementation: change the FUSED_COMM / FusedCommMoEScheduler description in
MOE_DEVELOPER_GUIDE.md (and the repeated block at lines 206-213) to state that
FusedCommMoEScheduler._forward_chunk() triggers an EP-wide AllReduce via the
load balancer helper (it calls _load_balancer_update_statistic(...,
ignore_allreduce=False)), so EPLB statistics are AllReduced across ranks by that
helper rather than being reduced purely inside the backend kernel; reference the
class/method names (FusedCommMoEScheduler._forward_chunk and
_load_balancer_update_statistic) and update the sentence that currently claims
"EPLB stats AllReduced internally" to reflect the real AllReduce routing
performed by the load balancer helper.

In `@tensorrt_llm/_torch/modules/fused_moe/moe_scheduler.py`:
- Around line 84-94: The abstract method declaration for forward currently
places the ellipsis on the same line and triggers flake8 E704; in the class
defining forward (method name forward) break the body so the ellipsis (...) is
on its own indented line under the signature (i.e., keep the def forward(...)
header unchanged but move the trailing ellipsis onto the next line indented to
the same level as the method body) so the abstractmethod body is a separate
statement and lint will pass.
- Around line 237-248: The multi-chunk workspace sizing in
_prepare_workspaces_for_chunk incorrectly always uses moe.mapping.moe_ep_size *
max_tokens; change it to mirror the DeepEPLowLatency formula used in
_prepare_workspace_deepgemm by using num_slots * max_tokens for
DeepEPLowLatency/DeepGemm paths: detect the DeepEPLowLatency/DeepGemm case (same
condition or type check used in _prepare_workspace_deepgemm), compute
chunk_size_0 as moe.mapping.num_slots * max(all_rank_num_tokens_list[0])
(falling back to chunk_size_list[0] as before), and ensure the appended second
workspace (when use_multi_stream) uses the same corrected size; keep references
to _prepare_workspace_deepgemm(), _prepare_workspaces_for_chunk(),
DeepEPLowLatency and DeepGemmFusedMoE.run_moe() to locate the relevant logic.

In `@tensorrt_llm/_torch/modules/fused_moe/quantization.py`:
- Around line 4900-4916: The current implementation leaves module._t_l1/_t_l2
cached after the first call to _transform_main_weights(), so subsequent
load_weights() calls reuse stale transformed tensors while the original raw
MXFP4 tensors remain resident; update load_weights() (or
_transform_main_weights()) to either clear/invalidate module._t_l1 and
module._t_l2 before re-transforming or, after obtaining the transformed pairs
from _transform_weights_for_mega_moe(), replace the original raw tensors
(module.w3_w1_weight, module.w3_w1_weight_scale, module.w2_weight,
module.w2_weight_scale) with the transformed single-source copies and free the
raw ones so there is only one authoritative weight representation and no stale
DG-form tensors remain.
- Around line 4702-4725: The copy_() call arguments in the block that writes to
dst_w3_w1_weight, dst_w3_w1_weight_scale, dst_w2_weight, and dst_w2_weight_scale
are using hanging indents that trigger Flake8 E126; rewrap each call so
continuations are aligned with the opening parenthesis (or place the first
argument on the same line as the function name) and indent subsequent wrapped
lines to the same column as the first argument, ensuring calls to
self._to_weight_device_uint8(...) and the non_blocking=True kwarg are properly
aligned for each copy_() invocation.

In `@tests/integration/test_lists/test-db/l0_dgx_b300.yml`:
- Around line 42-44: Add the MegaMoE EPLB selector for the MEGAMOE_DEEPGEMM
backend to the B300 post-merge matrix by inserting a counterpart entry for the
existing multi-GPU test selector: add a line for
unittest/_torch/modules/moe/test_moe_module.py::test_configurable_moe_multi_gpu_eplb[...]
that matches the MEGAMOE_DEEPGEMM parameters (e.g., routing=DeepSeekV3 and
quant=W4A8_MXFP4_MXFP8) so the EPLB happy-path multi-GPU case is scheduled
alongside the current test_configurable_moe_multi_gpu[...] and
test_configurable_moe_single_gpu[...] selectors.

In `@tests/unittest/_torch/modules/moe/test_moe_module.py`:
- Around line 128-142: Update _ensure_dist_for_megamoe to signal when it created
the NCCL default process group by returning True if it initialized the group and
False otherwise, and in tests that call it (the test(s) using MegaMoE) wrap the
test body in a try/finally: call _ensure_dist_for_megamoe(...), store its
boolean result, run the existing test logic in try, and in finally call
torch.distributed.destroy_process_group() only when the helper returned True;
reference the helper function _ensure_dist_for_megamoe and
torch.distributed.destroy_process_group to locate where to add the return value
and the try/finally cleanup.
- Around line 121-125: The current _get_free_tcp_port() opens, binds and
immediately closes the socket, causing a TOCTOU race with
dist.init_process_group(); instead, change the reservation pattern so the socket
stays open until workers call dist.init_process_group() (or implement a
retry-on-EADDRINUSE loop). Concretely, modify _get_free_tcp_port to bind and
listen but return both the reserved socket and port (or provide a context
manager reserve_tcp_port that yields the (sock, port)), update callers in the
test to keep the returned socket open across the rendezvous and only close it
after dist.init_process_group() completes; alternatively implement a retry loop
around _get_free_tcp_port + init_process_group to detect EADDRINUSE and retry
with a new port. Ensure references to _get_free_tcp_port and
dist.init_process_group are updated accordingly.

---

Outside diff comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Around line 1-21: This module (create_moe.py) is missing the required NVIDIA
SPDX copyright/license header; add the repository-standard NVIDIA SPDX header
(including the current modification year) as the very first lines of the file
above all imports and definitions so the file that defines symbols like
ConfigurableMoE, CuteDslFusedMoE, CutlassFusedMoE, DeepGemmFusedMoE,
DenseGEMMFusedMoE, TritonFusedMoE, TRTLLMGenFusedMoE, VanillaMoE, WideEPMoE,
MoE, MoEWeightLoadingMode, and MegaMoEDeepGemm contains the mandated header.
Ensure the header text exactly matches the project's required NVIDIA SPDX format
and include the current year.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c89a4ddd-c5ca-4bd5-8818-46677f54c14f

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4b05c and e72cd0e.

📒 Files selected for processing (17)
  • cpp/tensorrt_llm/deep_gemm/CMakeLists.txt
  • tensorrt_llm/_torch/modules/fused_moe/MOE_DEVELOPER_GUIDE.md
  • tensorrt_llm/_torch/modules/fused_moe/communication/nvlink_one_sided.py
  • tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py
  • tensorrt_llm/_torch/modules/fused_moe/create_moe.py
  • tensorrt_llm/_torch/modules/fused_moe/interface.py
  • tensorrt_llm/_torch/modules/fused_moe/mega_moe/__init__.py
  • tensorrt_llm/_torch/modules/fused_moe/mega_moe/backend.py
  • tensorrt_llm/_torch/modules/fused_moe/mega_moe/mega_moe_deepgemm.py
  • tensorrt_llm/_torch/modules/fused_moe/moe_scheduler.py
  • tensorrt_llm/_torch/modules/fused_moe/quantization.py
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
  • tests/integration/test_lists/test-db/l0_dgx_b300.yml
  • tests/unittest/_torch/modules/moe/moe_test_utils.py
  • tests/unittest/_torch/modules/moe/quantize_utils.py
  • tests/unittest/_torch/modules/moe/test_moe_backend.py
  • tests/unittest/_torch/modules/moe/test_moe_module.py
💤 Files with no reviewable changes (1)
  • tensorrt_llm/_torch/modules/fused_moe/mega_moe/backend.py

Comment thread tensorrt_llm/_torch/modules/fused_moe/mega_moe/mega_moe_deepgemm.py
Comment thread tensorrt_llm/_torch/modules/fused_moe/mega_moe/mega_moe_deepgemm.py
Comment thread tensorrt_llm/_torch/modules/fused_moe/MOE_DEVELOPER_GUIDE.md
Comment thread tensorrt_llm/_torch/modules/fused_moe/moe_scheduler.py
Comment thread tensorrt_llm/_torch/modules/fused_moe/moe_scheduler.py
Comment thread tensorrt_llm/_torch/modules/fused_moe/quantization.py
Comment thread tensorrt_llm/_torch/modules/fused_moe/quantization.py
Comment thread tests/integration/test_lists/test-db/l0_dgx_b300.yml
Comment thread tests/unittest/_torch/modules/moe/test_moe_module.py
Comment thread tests/unittest/_torch/modules/moe/test_moe_module.py
… integration

- Split ConfigurableMoE forward into MoEScheduler (ExternalComm / FusedComm)
- Refactor MegaMoEDeepGemm into ConfigurableMoE quant-method path
- Add EPLB support (incl. dynamic), multi-chunk execution, and CI coverage
- Defer DG NVLink SymmBuffer allocation from backend __init__ to
  create_weights so it sees ConfigurableMoE's synced num_slots /
  expert_size_per_partition (rather than the placeholder
  num_slots = num_experts seeded under init_load_balancer=False), keeping
  sym_buffer.num_experts == num_experts_per_rank * num_ranks under EPLB
  and matching the DeepGEMM mega.hpp contract
- Add EPLB unit test

Signed-off-by: xxi <xxi@nvidia.com>
@xxi-nv xxi-nv force-pushed the feat_refactor_megamoe_deepgemm branch from e72cd0e to 87aa974 Compare May 8, 2026 13:35
@NVIDIA NVIDIA deleted a comment from coderabbitai Bot May 8, 2026
@xxi-nv
Copy link
Copy Markdown
Collaborator Author

xxi-nv commented May 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47410 [ run ] triggered by Bot. Commit: 87aa974 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47410 [ run ] completed with state FAILURE. Commit: 87aa974
/LLM/main/L0_MergeRequest_PR pipeline #37337 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…lement

`can_implement` is called from `get_quick_skip_reason` at pytest parametrize
collection time, long before the launcher initializes torch.distributed. The
prior `dist.is_initialized()` gate marked every MEGAMOE parametrize case as
"cannot implement" and dropped them in `iter_base_test_configs`, so
`pytest -k "MEGAMOE_DEEPGEMM"` collected 0/1180 items and exited 5 on
Jenkins L0_MergeRequest_PR #37337 (test_configurable_moe_{single,multi}_gpu).

`can_implement` should only answer static capability questions (SM / dtype /
quant / shape); whether an EP ProcessGroup is live is a runtime concern that
`__init__`'s `_resolve_ep_pg` already surfaces with a clear error. Removing
the probe restores MEGAMOE test collection and keeps the production failure
mode explicit instead of silently falling back.

Signed-off-by: xxi <xxi@nvidia.com>
@xxi-nv
Copy link
Copy Markdown
Collaborator Author

xxi-nv commented May 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47449 [ run ] triggered by Bot. Commit: 52206ee Link to invocation

Copy link
Copy Markdown
Collaborator

@leslie-fang25 leslie-fang25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants