Skip to content

Add DeepSeek V4 2-rank EP MoE single-layer program#426

Open
YunjiQin wants to merge 2 commits into
hw-native-sys:mainfrom
YunjiQin:moe-ep
Open

Add DeepSeek V4 2-rank EP MoE single-layer program#426
YunjiQin wants to merge 2 commits into
hw-native-sys:mainfrom
YunjiQin:moe-ep

Conversation

@YunjiQin
Copy link
Copy Markdown
Contributor

Summary

  • Introduces models/deepseek/v4/moe_ep.py, an end-to-end MoE decode program assembled as @pl.program with explicit cross-rank dispatch/combine over an HCCL window scratch (four payload channels: x_i8, scale, weight, r_route).
  • Refactors single-card kernels (expert_routed, combine, gate, hc_pre, hc_post, expert_shared) so they compose under @pl.inline inside @pl.function(InCore) callers — exposes *_inline aliases and adjusts SSA naming for the strict InCore-inline parser.
  • expert_routed now applies the per-row routing weight on the w2 output, turning combine into a pure scatter+sum and saving one window channel in the EP reduce.
  • Adds EP_ROUTING_GLOBAL config flag so gate can route over the full global expert space (default off keeps legacy single-card shard-local behavior); bumps DEMO preset n_routed_experts 8 → 16 to match the 2-rank EP demo (N_LOCAL=4, TOPK=2).
  • Updates moe.py to thread recv_weights through expert_routed and drop it from combine to match the new contract.

Introduces models/deepseek/v4/moe_ep.py: an end-to-end MoE decode
program assembled as @pl.program with explicit cross-rank
dispatch/combine over an HCCL window scratch (mirrors the
test_l3_ep_dispatch_combine protocol with four payload channels:
x_i8 / scale / weight / r_route).

Supporting refactors so single-card kernels compose under @pl.inline
inside @pl.function(InCore) callers:

- expert_routed: take recv_weights, apply per-row routing weight on
  the w2 output. combine then becomes a pure scatter + dense sum, so
  the cross-rank reduce in moe_ep saves one window channel.
- combine: drop recv_weights; sum only.
- gate: route over the full global expert space when
  config.EP_ROUTING_GLOBAL is True (default False keeps legacy
  single-card shard-local behavior).
- hc_pre / hc_post / gate / expert_shared / expert_routed: expose
  *_inline aliases via pl.inline(<fn>._func) and return their
  pl.Out tensor so inline call expressions parse.
- gate: rename per-spmd-loop base offset (t1_g / t1_h / t1_s) and
  hc_pre: rename rowN → rowN_p after fillpad — both work around the
  strict-SSA InCore-inline parser (pypto #1603).

config:
- DEMO preset n_routed_experts 8 → 16 (matches 2-rank EP demo with
  N_LOCAL=4 per rank, TOPK=2).
- Add EP_ROUTING_GLOBAL flag; moe_ep flips it before importing
  sub-kernels.

moe.py: thread recv_weights into expert_routed and drop it from
combine to match the new contract.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b43e5534-ea18-41af-8810-6166a48179f0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors DeepSeek-V4 MoE weight application and introduces a complete distributed 2-rank pipeline. Routing weights now apply in expert_routed (Stage 1b) rather than combine, supported by new global/local routing modes via EP_ROUTING_GLOBAL flag. Gate kernel uses distinct SPMD offsets to prevent incorrect loop merging. Multiple kernels gain _inline aliases for composition. A new moe_ep.py end-to-end program demonstrates dispatch/combine synchronization across ranks.

Changes

Weight Routing Refactor

Layer / File(s) Summary
expert_routed: accept and apply routing weights
models/deepseek/v4/expert_routed.py, models/deepseek/v4/expert_routed_test
expert_routed now accepts recv_weights and multiplies per-row routing weights into w2 matmul output (Stage 1b) before writing recv_y. golden_expert_routed and build_tensor_specs updated to generate recv_weights and apply them in the reference computation.
combine: remove routing weight handling
models/deepseek/v4/combine.py
combine signature removes recv_weights parameter; expects recv_y pre-scaled by routing weights. Removes routed_w_buf allocation, weight scatter writes, and weight multiplication in reduction. golden_combine and build_tensor_specs updated to remove weight generation/application.
moe: orchestration routing updates
models/deepseek/v4/moe.py
moe() passes recv_weights to expert_routed call and removes it from combine call. golden_moe() updated to match new kernel interfaces.

Routing Modes and Configuration

Layer / File(s) Summary
Configuration: DEMO sizing and global routing flag
models/deepseek/v4/config.py
DEMO.n_routed_experts increased 8 → 16. New module-level EP_ROUTING_GLOBAL flag (default False) controls whether each rank routes to global or local expert sets; intended to be overridden by moe_ep.py before gate import.
Gate: global/local routing and SPMD offset refactoring
models/deepseek/v4/gate.py
gate imports config and conditions N_EXPERTS on EP_ROUTING_GLOBAL. Reworks all SPMD loops to use distinct per-path offset variables (t1_g for matmul, t1_h for hash-routing, t1_s for sort-routing) to avoid incorrect loop-offset merging in outlined form.

Inline Kernel Composition

Layer / File(s) Summary
Inline aliases for kernel reuse
models/deepseek/v4/expert_shared.py, models/deepseek/v4/hc_post.py, models/deepseek/v4/hc_pre.py
expert_shared adds explicit return statement and exports expert_shared_inline alias. hc_post exports hc_post_inline alias. hc_pre renames Sinkhorn pad temporaries (row0_p–row3_p) to avoid rebind issues and exports hc_pre_inline alias; all aliases enable @pl.inline-style inlining while preserving module globals.

End-to-End EP MoE Pipeline

Layer / File(s) Summary
moe_ep: complete 2-rank MoE pipeline
models/deepseek/v4/moe_ep.py
New module implements a full Expert-Parallelism MoE pipeline for 2 ranks using HCCL distributed windows. Configuration overrides DEMO preset (EP_WORLD_SIZE=2, EP_ROUTING_GLOBAL=True) before importing kernels. _build_ep_moe_program() defines @pl.program with inline compute stages and two distributed phases: dispatch_step (histogram, publish counts, prefix offsets, 4-channel remote stores) and combine_step (routed output fan-in, per-token reduction for TOPK=2). Includes golden_moe_ep() reference, INT8 quantization helpers, build_tensor_specs() for test data, and CLI harness that compares x_next against golden.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • hw-native-sys/pypto-lib#280: Both PRs align combine-stage behavior to operate purely on routing-weight-scaled recv_y without per-expert recv_weights parameter.

Poem

🐰 The weights now flow where experts roam,
Applied in stage one, before they come home.
With global and local routes to explore,
And inline aliases to share once more—
The pipeline spans ranks, through HCCL's way,
MoE dispatch and combine all in one day! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing a 2-rank EP MoE program for DeepSeek V4, which is the primary addition in moe_ep.py and the core objective of the PR.
Description check ✅ Passed The description comprehensively relates to the changeset, detailing the new moe_ep.py program, refactored kernels with inline aliases, recv_weights threading changes, EP_ROUTING_GLOBAL flag, and preset updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the DeepSeek-V4 MoE implementation to support multi-card Expert Parallelism (EP) by applying routing weights row-wise in expert_routed instead of combine, and introduces a 2-rank end-to-end EP single-layer decode program in moe_ep.py. Feedback highlights critical synchronization and initialization issues in moe_ep.py where allocated window buffers (pub_counts, count_done, and combine_done) are not zero-initialized, potentially causing data corruption or premature barrier fall-throughs. Additionally, evaluating N_EXPERTS at module import time in gate.py based on mutable global configuration poses a high risk of module-caching bugs.

Comment on lines +255 to +256
# ---------- histogram: scalar histogram on indices ----------
send_counts = pl.array.create(N_RANKS * N_LOCAL, pl.INT32)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The window buffers pub_counts and count_done are allocated but not zero-initialized. Since notifications are only sent when v != 0, any cells where v == 0 will contain uninitialized garbage values. This will corrupt the prefix sum offsets and counts, leading to incorrect routing and potential crashes. Additionally, uninitialized synchronization cells in count_done can cause the barrier wait to fall through prematurely. Please initialize both buffers to 0 at the start of dispatch_step.

            # ---------- initialization ----------
            for r0 in pl.range(N_RANKS * N_RANKS):
                for e0 in pl.range(N_LOCAL):
                    pl.write(pub_counts, [r0, e0], 0)
            for r0 in pl.range(N_RANKS):
                pl.write(count_done, [r0, 0], 0)

            # ---------- histogram: scalar histogram on indices ----------
            send_counts = pl.array.create(N_RANKS * N_LOCAL, pl.INT32)

Comment on lines +456 to +457
) -> pl.Tensor[[B, S, D], pl.BF16]:
# ---------- push: TPUT recv_y rows to peer's routed_y_buf ----
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The combine_done window buffer is not zero-initialized. If it contains stale or garbage values >= 1, the barrier wait will succeed immediately without waiting for peers, causing a race condition. Please initialize combine_done to 0 at the start of combine_step.

Suggested change
) -> pl.Tensor[[B, S, D], pl.BF16]:
# ---------- push: TPUT recv_y rows to peer's routed_y_buf ----
) -> pl.Tensor[[B, S, D], pl.BF16]:
for r0 in pl.range(N_RANKS):
pl.write(combine_done, [r0, 0], 0)
# ---------- push: TPUT recv_y rows to peer's routed_y_buf ----

# over its own [n_routed_experts // EP_WORLD_SIZE] shard.
# EP_ROUTING_GLOBAL=True (used by moe_ep.py): every rank routes over the full
# global expert set so dispatch can fan tokens across ranks.
N_EXPERTS = M.n_routed_experts if _cfg.EP_ROUTING_GLOBAL else M.n_routed_experts // EP_WORLD_SIZE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Evaluating N_EXPERTS at module import time based on mutable global configuration (_cfg.EP_ROUTING_GLOBAL) is highly prone to module-caching bugs. If gate.py is imported elsewhere first (e.g., in moe.py or unit tests), subsequent imports of gate in moe_ep.py will reuse the cached module from sys.modules without re-evaluating N_EXPERTS, leading to incorrect shapes and runtime errors. Consider refactoring the design to avoid module-level constants that depend on mutable global state, or ensure that modules are reloaded when configuration changes.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/deepseek/v4/moe_ep.py`:
- Line 1072: The assignment of a lambda to the name out is triggering Ruff E731;
replace that lambda with a small helper function named out that accepts (name,
shape, dtype) and returns TensorSpec(name, shape, dtype, is_output=True). Update
the single-line lambda at the current declaration to a proper def out(name,
shape, dtype): return TensorSpec(name, shape, dtype, is_output=True) so all
existing uses of out continue to work unchanged and satisfy the linter.
- Around line 1047-1049: The three statements currently use semicolon-chained
multiple append calls (routed_w1_i8_list.append(w1_i8);
routed_w1_s_list.append(w1_s) etc.), which triggers E702; split each
semicolon-separated append into its own line so each list append call is on a
separate statement (e.g., routed_w1_i8_list.append(w1_i8) on one line and
routed_w1_s_list.append(w1_s) on the next), doing the same for routed_w3_* and
routed_w2_* append pairs to satisfy the linter.
- Around line 24-28: moe_ep.py currently mutates config after some sub-kernel
modules may already be cached, causing stale FLASH/EP_* constants in
hc_pre_inline, hc_post_inline, gate_inline, expert_shared_inline and
expert_routed_inline; move the config overrides so they happen before any import
of those sub-kernel modules (or, if imports must be earlier, perform
importlib.reload on each of those modules after applying overrides) so their
module-level bindings (FLASH, EP_WORLD_SIZE, B/S/D/RECV_MAX) are computed
against the correct values; also fix the lint issues by replacing the
multi-statement semicolon line(s) like routed_w*_i8_list.append(...); ... with
separate statements on their own lines and replace `out = lambda ...` with a
proper def out(...) function to satisfy E702 and E731.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c6629ba2-2b5d-45bf-829d-7310ac329acd

📥 Commits

Reviewing files that changed from the base of the PR and between 781c699 and 30f7889.

📒 Files selected for processing (9)
  • models/deepseek/v4/combine.py
  • models/deepseek/v4/config.py
  • models/deepseek/v4/expert_routed.py
  • models/deepseek/v4/expert_shared.py
  • models/deepseek/v4/gate.py
  • models/deepseek/v4/hc_post.py
  • models/deepseek/v4/hc_pre.py
  • models/deepseek/v4/moe.py
  • models/deepseek/v4/moe_ep.py

Comment on lines +24 to +28
# === Module preamble: override config BEFORE importing sub-kernels ==========
# Sub-kernels (hc_pre / gate / expert_routed / ...) bind preset constants at
# module-import time via ``from config import FLASH as M``. By the time those
# modules execute their first line, the overrides below must already be in
# place — otherwise they'd capture FLASH and EP_WORLD_SIZE=16 instead.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find repo paths that can import the EP sub-kernels or moe_ep in the same process.
# If any of the sub-kernels can be imported before moe_ep, this import-order dependency is real.
rg -n -C2 --type=py 'from\s+(?:models\.deepseek\.v4\.)?(hc_pre|hc_post|gate|expert_shared|expert_routed|moe_ep)\s+import' .
rg -n -C2 --type=py 'import\s+(?:models\.deepseek\.v4\.)?(hc_pre|hc_post|gate|expert_shared|expert_routed|moe_ep)\b' .

Repository: hw-native-sys/pypto-lib

Length of output: 8706


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect moe_ep.py around the config override + imports
sed -n '1,120p' models/deepseek/v4/moe_ep.py

# 2) Find where the *_inline kernels are defined / imported from
rg -n "hc_pre_inline|hc_post_inline|gate_inline|expert_shared_inline|expert_routed_inline" . --glob='*.py'

# 3) Inspect those inline modules for "from config import ..." patterns that bind constants at import time
# (only check the files that define the inline symbols)
python3 - <<'PY'
import subprocess, re, os, textwrap
root='.'
# get candidate python files that mention the inline symbols
cmd=["bash","-lc", "rg -l \"hc_pre_inline|hc_post_inline|gate_inline|expert_shared_inline|expert_routed_inline\" . --glob='*.py'"]
out=subprocess.check_output(cmd, text=True)
files=[f.strip() for f in out.splitlines() if f.strip()]
print("INLINE_FILES:", len(files))
for f in files[:50]:
    print(f)

PY

Repository: hw-native-sys/pypto-lib

Length of output: 7079


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Check Ruff-flagged lines in moe_ep.py
sed -n '1020,1090p' models/deepseek/v4/moe_ep.py

# 2) Inspect whether sub-kernels bind config constants at import time.
#    Look for "from config import FLASH as M" / module-level uses of EP_* / RECV_MAX / etc.
for f in models/deepseek/v4/hc_pre.py models/deepseek/v4/gate.py models/deepseek/v4/expert_shared.py models/deepseek/v4/expert_routed.py models/deepseek/v4/hc_post.py; do
  echo "===== $f ====="
  rg -n "from config import|FLASH as|EP_WORLD_SIZE|EP_ROUTING_GLOBAL|RECV_MAX" "$f" || true
  sed -n '1,120p' "$f"
done

# 3) Find other modules that import moe_ep, and see if they also import those sub-kernels.
rg -n --type=py "(from .*moe_ep import|import .*moe_ep)" models/deepseek/v4 || true
rg -n --type=py "from\s+(?:hc_pre|hc_post|gate|expert_shared|expert_routed)\s+import" models/deepseek/v4 || true

# 4) (Narrow) find within top-level scripts any combined imports that could load sub-kernels before moe_ep.
rg -n --type=py "moe_ep" . || true

Repository: hw-native-sys/pypto-lib

Length of output: 40776


🏁 Script executed:

#!/bin/bash
set -euo pipefail
nl -ba models/deepseek/v4/moe_ep.py | sed -n '1025,1085p'

Repository: hw-native-sys/pypto-lib

Length of output: 110


🏁 Script executed:

#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p=Path("models/deepseek/v4/moe_ep.py")
lines=p.read_text().splitlines()
def show(a,b):
    for i in range(a,b+1):
        if 1<=i<=len(lines):
            print(f"{i}:{lines[i-1]}")
print("=== moe_ep.py:55-85 ===")
show(55,85)
print("=== moe_ep.py:1035-1085 ===")
show(1035,1085)
PY

Repository: hw-native-sys/pypto-lib

Length of output: 5024


Fix import-time config binding for EP sub-kernels (moe_ep.py)

moe_ep.py mutates config before importing hc_pre_inline/hc_post_inline/gate_inline/expert_shared_inline/expert_routed_inline (55-74, 69-73), but those kernel modules bind constants at module import time (e.g., from config import FLASH as M, ... and derive module-level B/S/D/RECV_MAX globals). If any of these modules were imported earlier in the same Python process (possible since models/deepseek/v4/moe.py and the decode entrypoints import hc_pre/hc_post/gate/expert_* at top-level), Python will reuse cached modules and silently keep stale FLASH/EP-related values, mixing incompatible shape/routing contracts.

  • Ruff: routed_w*_i8_list.append(...); ... violates E702 on 1047-1049; out = lambda ... violates E731 on 1072.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/moe_ep.py` around lines 24 - 28, moe_ep.py currently
mutates config after some sub-kernel modules may already be cached, causing
stale FLASH/EP_* constants in hc_pre_inline, hc_post_inline, gate_inline,
expert_shared_inline and expert_routed_inline; move the config overrides so they
happen before any import of those sub-kernel modules (or, if imports must be
earlier, perform importlib.reload on each of those modules after applying
overrides) so their module-level bindings (FLASH, EP_WORLD_SIZE, B/S/D/RECV_MAX)
are computed against the correct values; also fix the lint issues by replacing
the multi-statement semicolon line(s) like routed_w*_i8_list.append(...); ...
with separate statements on their own lines and replace `out = lambda ...` with
a proper def out(...) function to satisfy E702 and E731.

Comment on lines +1047 to +1049
routed_w1_i8_list.append(w1_i8); routed_w1_s_list.append(w1_s)
routed_w3_i8_list.append(w3_i8); routed_w3_s_list.append(w3_s)
routed_w2_i8_list.append(w2_i8); routed_w2_s_list.append(w2_s)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Split these semicolon-chained appends.

Ruff is already flagging E702 here, so this will likely fail lint as written.

Suggested fix
-        routed_w1_i8_list.append(w1_i8); routed_w1_s_list.append(w1_s)
-        routed_w3_i8_list.append(w3_i8); routed_w3_s_list.append(w3_s)
-        routed_w2_i8_list.append(w2_i8); routed_w2_s_list.append(w2_s)
+        routed_w1_i8_list.append(w1_i8)
+        routed_w1_s_list.append(w1_s)
+        routed_w3_i8_list.append(w3_i8)
+        routed_w3_s_list.append(w3_s)
+        routed_w2_i8_list.append(w2_i8)
+        routed_w2_s_list.append(w2_s)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
routed_w1_i8_list.append(w1_i8); routed_w1_s_list.append(w1_s)
routed_w3_i8_list.append(w3_i8); routed_w3_s_list.append(w3_s)
routed_w2_i8_list.append(w2_i8); routed_w2_s_list.append(w2_s)
routed_w1_i8_list.append(w1_i8)
routed_w1_s_list.append(w1_s)
routed_w3_i8_list.append(w3_i8)
routed_w3_s_list.append(w3_s)
routed_w2_i8_list.append(w2_i8)
routed_w2_s_list.append(w2_s)
🧰 Tools
🪛 Ruff (0.15.14)

[error] 1047-1047: Multiple statements on one line (semicolon)

(E702)


[error] 1048-1048: Multiple statements on one line (semicolon)

(E702)


[error] 1049-1049: Multiple statements on one line (semicolon)

(E702)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/moe_ep.py` around lines 1047 - 1049, The three statements
currently use semicolon-chained multiple append calls
(routed_w1_i8_list.append(w1_i8); routed_w1_s_list.append(w1_s) etc.), which
triggers E702; split each semicolon-separated append into its own line so each
list append call is on a separate statement (e.g.,
routed_w1_i8_list.append(w1_i8) on one line and routed_w1_s_list.append(w1_s) on
the next), doing the same for routed_w3_* and routed_w2_* append pairs to
satisfy the linter.

sw2_i8 = sw2_i8.unsqueeze(0).expand(N_RANKS, -1, -1).contiguous()
sw2_s = sw2_s.unsqueeze(0).expand(N_RANKS, -1).contiguous()

out = lambda name, shape, dtype: TensorSpec(name, shape, dtype, is_output=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace the out lambda with a small helper.

Ruff E731 flags assigning a lambda here.

Suggested fix
-    out = lambda name, shape, dtype: TensorSpec(name, shape, dtype, is_output=True)
+    def out(name, shape, dtype):
+        return TensorSpec(name, shape, dtype, is_output=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
out = lambda name, shape, dtype: TensorSpec(name, shape, dtype, is_output=True)
def out(name, shape, dtype):
return TensorSpec(name, shape, dtype, is_output=True)
🧰 Tools
🪛 Ruff (0.15.14)

[error] 1072-1072: Do not assign a lambda expression, use a def

Rewrite out as a def

(E731)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/moe_ep.py` at line 1072, The assignment of a lambda to the
name out is triggering Ruff E731; replace that lambda with a small helper
function named out that accepts (name, shape, dtype) and returns
TensorSpec(name, shape, dtype, is_output=True). Update the single-line lambda at
the current declaration to a proper def out(name, shape, dtype): return
TensorSpec(name, shape, dtype, is_output=True) so all existing uses of out
continue to work unchanged and satisfy the linter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant