issue/826: kunlun layernorm #827

zhangyue207 · 2025-12-22T09:22:10Z

…o issue/826

xgqdut2016 · 2025-12-23T06:09:56Z

src/infiniop/ops/layer_norm/kunlun/layer_norm_kunlun.xpu

+__global__ void layerNormKernel(
+    int32_t loop_idx,
+    Tdata *output,                 // [b, seq, dim]
+    Tdata *output_standardization, // [b, seq, dim]


这几个注释[b, seq, dim]是不是删掉比较好，给人一种只能处理3D向量的错觉

xgqdut2016 · 2025-12-23T06:17:16Z

src/infiniop/ops/layer_norm/kunlun/layer_norm_kunlun.xpu

+        int32_t offset_output_standardization = 0;
+        int32_t offset_output_rstd_deviation = 0;
+        int32_t offset_input = 0;
+        for (int i = 0; i < ndim - 1; i++) {


这个循环和上面计算t_coords[]的循环。类似能否合并

xgqdut2016 · 2025-12-23T06:18:51Z

src/infiniop/ops/layer_norm/kunlun/layer_norm_kunlun.xpu

+    int32_t offset_output_rstd_deviation = 0;
+    int32_t offset_input = 0;
+    for (int i = 0; i < ndim - 1; i++) {
+        int32_t dim_i = shape_local[i].value;


这个循环和上面计算t_coords[]的循环能不能合并成一个循环，比如说也不要单独申请内存存储t_coords，直接使用寄存器变量，借助一次循环得到input和output的index，这样代码以及速度会不会更好一些，还是说有什么特殊考虑

xgqdut2016 · 2025-12-23T06:28:15Z

src/infiniop/ops/layer_norm/kunlun/layer_norm_kunlun.xpu

+                               ? 255
+                               : static_cast<int32_t>(info.othersize);
+        int32_t num_loops = (static_cast<int32_t>(info.othersize) + num_blocks - 1) / num_blocks;
+        for (int32_t i = 0; i < num_loops; i++) {


能不能把这个任务分配过程挪到kernel里面，这样写循环，会导致增加很多kernel启动开销吧

issue/826: kunlun layernorm

ed1a21b

zhangyue207 requested a review from xgqdut2016 December 22, 2025 09:22

zhangyue207 linked an issue Dec 22, 2025 that may be closed by this pull request

[DEV] 昆仑芯 LayerNorm #826

Open

zhangyue207 changed the title ~~issue/826: kunlun layernorm~~ kunlun layernorm Dec 22, 2025

PanZezhong1725 requested a review from gongchensu December 22, 2025 09:44

Merge branch 'main' of https://github.com/InfiniTensor/InfiniCore int…

94c769a

…o issue/826

xgqdut2016 reviewed Dec 23, 2025

View reviewed changes

zhangyue207 added 3 commits December 23, 2025 15:53

issue/826: modify according comments

37cd802

issue/826: delete debug print

f62b11f

issue/826: add print for test script

c5f3c8a

xgqdut2016 approved these changes Dec 23, 2025

View reviewed changes

zhangyue207 changed the title ~~kunlun layernorm~~ issue/826: kunlun layernorm Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue/826: kunlun layernorm #827

issue/826: kunlun layernorm #827

Uh oh!

zhangyue207 commented Dec 22, 2025 •

edited

Loading

Uh oh!

xgqdut2016 Dec 23, 2025

Uh oh!

zhangyue207 Dec 23, 2025

Uh oh!

xgqdut2016 Dec 23, 2025

Uh oh!

zhangyue207 Dec 23, 2025

Uh oh!

xgqdut2016 Dec 23, 2025

Uh oh!

zhangyue207 Dec 23, 2025

Uh oh!

xgqdut2016 Dec 23, 2025

Uh oh!

zhangyue207 Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

issue/826: kunlun layernorm #827

Are you sure you want to change the base?

issue/826: kunlun layernorm #827

Uh oh!

Conversation

zhangyue207 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangyue207 commented Dec 22, 2025 •

edited

Loading