import flashqla and support cudagraph for gdn by WANDY666 · Pull Request #1292 · ModelTC/LightLLM

WANDY666 · 2026-05-06T09:08:18Z

No description provided.

gemini-code-assist

Code Review

This pull request integrates the FlashQLA backend into the qwen3next model's chunk_gated_delta_rule operation, providing a high-performance alternative to FLA Triton kernels on SM90+ hardware. It includes compatibility detection logic and a new parity test suite with benchmarking. Feedback highlights that the version requirements for PyTorch (2.8) and CUDA (12.8) appear to be typos that would disable the backend on current environments, and suggests making the version parsing logic more robust. Additionally, it is recommended to explicitly calculate the scale parameter when it is None to ensure consistency with the fallback implementation.

gemini-code-assist · 2026-05-06T09:10:07Z

+    tv = torch.__version__.split("+")[0].split(".")
+    if (int(tv[0]), int(tv[1])) < (2, 8):
+        return None
+    cv = torch.version.cuda
+    if cv is None:
+        return None
+    cv_parts = cv.split(".")
+    if (int(cv_parts[0]), int(cv_parts[1])) < (12, 8):
+        return None


The version requirements for PyTorch (2.8) and CUDA (12.8) appear to be typos, as these versions are either not yet released or do not exist (CUDA 12.8). This will cause the FlashQLA backend to be disabled on all current environments. Additionally, the parsing logic is fragile and may raise IndexError or ValueError depending on the version string format (e.g., if it contains non-numeric suffixes like rc1).

try: tv = torch.__version__.split("+")[0].split(".") if len(tv) < 2 or (int(tv[0]), int(tv[1])) < (2, 4): return None cv = torch.version.cuda if cv is None: return None cv_parts = cv.split(".") if len(cv_parts) < 2 or (int(cv_parts[0]), int(cv_parts[1])) < (12, 1): return None except (ValueError, IndexError): return None

gemini-code-assist · 2026-05-06T09:10:07Z

+        return flashqla_fn(
+            q=q.contiguous(),
+            k=k.contiguous(),
+            v=v.contiguous(),
+            g=g.contiguous(),
+            beta=beta.contiguous(),
+            scale=scale,
+            initial_state=initial_state.contiguous() if initial_state is not None else None,
+            output_final_state=output_final_state,
+            cu_seqlens=cu_seqlens,
+            head_first=head_first,
+            use_qk_l2norm_in_kernel=use_qk_l2norm_in_kernel,
+        )


If scale is None, it is passed directly to flashqla_fn. The fallback Triton path explicitly calculates scale as k.shape[-1] ** -0.5. To ensure consistency and avoid potential issues if the flash_qla library does not handle None defaults, the scale should be explicitly provided.

Suggested change

return flashqla_fn(

q=q.contiguous(),

k=k.contiguous(),

v=v.contiguous(),

g=g.contiguous(),

beta=beta.contiguous(),

scale=scale,

initial_state=initial_state.contiguous() if initial_state is not None else None,

output_final_state=output_final_state,

cu_seqlens=cu_seqlens,

head_first=head_first,

use_qk_l2norm_in_kernel=use_qk_l2norm_in_kernel,

)

return flashqla_fn(

q=q.contiguous(),

k=k.contiguous(),

v=v.contiguous(),

g=g.contiguous(),

beta=beta.contiguous(),

scale=scale if scale is not None else k.shape[-1] ** -0.5,

initial_state=initial_state.contiguous() if initial_state is not None else None,

output_final_state=output_final_state,

cu_seqlens=cu_seqlens,

head_first=head_first,

use_qk_l2norm_in_kernel=use_qk_l2norm_in_kernel,

)

WANDY666 added 3 commits May 6, 2026 07:49

add flashqla

5845371

fix

f274bda

add test and benchmark

d919fc7

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

WANDY666 added 3 commits May 6, 2026 09:19

format

1453651

support prefill cudagraph

e9be9b1

narrow down the wrapper

05c03ce

WANDY666 changed the title ~~import flashqla~~ import flashqla and support cudagraph for gdn May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import flashqla and support cudagraph for gdn#1292

import flashqla and support cudagraph for gdn#1292
WANDY666 wants to merge 6 commits intomainfrom
flashqla

WANDY666 commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

WANDY666 commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant