support prefill cudagraph for gdn by WANDY666 · Pull Request #1294 · ModelTC/LightLLM

WANDY666 · 2026-05-08T02:53:40Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a wrapper method _gdn_prefill_wrapper_run in the Qwen3Next transformer layer to handle prefill operations, specifically adding support for CUDA graph capture by managing tensor allocations and CPU-side function execution within the graph. Review feedback identifies a potential typo in the method name prefill_cuda_graph_add_cpu_runnning_func and suggests optimizing the handling of the z tensor during graph capture by using a view instead of redundant allocation and copying.

gemini-code-assist · 2026-05-08T02:56:28Z

+                _o.copy_(tmp_o)
+                return
+
+            infer_state.prefill_cuda_graph_add_cpu_runnning_func(func=gdn_prefill_func, after_graph=pre_capture_graph)


There is a likely typo in the method name prefill_cuda_graph_add_cpu_runnning_func. It contains three 'n's in 'runnning'. Please verify if this matches the intended method name in the base class or infer_state object.

gemini-code-assist · 2026-05-08T02:56:28Z

+            z_shape = o_shape
+
+            infer_state.prefill_cuda_graph_create_graph_obj()
+            infer_state.prefill_cuda_graph_get_current_capture_graph().__enter__()
+            o = torch.empty(o_shape, dtype=o_dtype, device=o_device)
+            _o = tensor_to_no_ref_tensor(o)
+            z = torch.empty(z_shape, dtype=o_dtype, device=o_device)
+            _z = tensor_to_no_ref_tensor(z)
+
+            def gdn_prefill_func(new_infer_state: Qwen3NextInferStateInfo):
+                conv_states, ssm_states = new_infer_state.req_manager.get_mamba_cache(self.layer_num_)
+                mixed_qkv, tmp_z, b, a = self._split_qkvzba(_mixed_qkvzba, is_decode=False)
+                _z.copy_(tmp_z)
+                tmp_o = self._gdn_prefill_kernel(
+                    mixed_qkv, conv_states, ssm_states, a, b, new_infer_state, layer_weight
+                )
+                tmp_o = tmp_o.view(_o.shape)
+                _o.copy_(tmp_o)
+                return


The allocation and copy of the z tensor in the CUDA graph capture path are redundant. Since z is a slice of mixed_qkvzba (which is an output of the previous graph segment), you can obtain z as a view directly during capture. This avoids an unnecessary allocation and a GPU-to-GPU copy inside the host node during replay. The subsequent z.contiguous() call in gdn_forward will handle contiguity if required by the norm kernel, and that copy will be efficiently captured in the graph.

infer_state.prefill_cuda_graph_create_graph_obj() infer_state.prefill_cuda_graph_get_current_capture_graph().__enter__() o = torch.empty(o_shape, dtype=o_dtype, device=o_device) _o = tensor_to_no_ref_tensor(o) _, z, _, _ = self._split_qkvzba(mixed_qkvzba, is_decode=False) def gdn_prefill_func(new_infer_state: Qwen3NextInferStateInfo): conv_states, ssm_states = new_infer_state.req_manager.get_mamba_cache(self.layer_num_) mixed_qkv, _, b, a = self._split_qkvzba(_mixed_qkvzba, is_decode=False) tmp_o = self._gdn_prefill_kernel( mixed_qkv, conv_states, ssm_states, a, b, new_infer_state, layer_weight ) _o.copy_(tmp_o.view(_o.shape)) return

WANDY666 added 2 commits May 8, 2026 02:51

support prefill cudagraph

9c8064f

narrow down the wrapper

3f9cd92

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

hiworldwzj added 2 commits May 8, 2026 08:06

fix

e8790b1

fix

cbb8bd7

hiworldwzj merged commit e1f8723 into main May 8, 2026
1 check passed

hiworldwzj deleted the pr-cudagraph branch May 8, 2026 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support prefill cudagraph for gdn#1294

support prefill cudagraph for gdn#1294
hiworldwzj merged 4 commits intomainfrom
pr-cudagraph

WANDY666 commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WANDY666 commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants