Opencl trm #1

TRM-coding · 2025-09-27T14:11:36Z

完成算子编写

…ise framework on CPU, NVIDIA, Cambricon, Metax, Moore, and Kunlun

* issue/450: change indexToReducedOffset() to indexToOffset in elementwise framework on CPU, NVIDIA, Cambricon, Metax, Moore, and Kunlun * issue/450: remove indexToReducedOffset() in all platforms * issue/450: add the testcases that pinpoint the issue in infiniop-test

…unlun to use the refactored interface and return unimplemented error for NEOX-style algorithm

…_rope_and_rope_v2 Issue/428: Merge `rope_v2` into `rope`

Signed-off-by: Ceng <441651826@qq.com>

issue/434 hccl support bf16

…nfiniTensor#457)

* issue/436: support kunlun rope U32 * issue/436: 支持9g7b 4b模型 --------- Co-authored-by: zhangyue <zhangyue@qiyuanlab.com>

…icon issue/434 - added bf16 support for Cambricon MLU

issue/466: 昆仑平台rope关于NEOX算法的实现

* issue/459 - Support more data type combinations * issue/459 - added test cases for 9G7B and 9G70B * issue/459 - modified rms kernel to support larger tensors

…ng readability

…macro

issue/469: disable NVIDIA-dequantize on Iluvatar GPU via ENABLE_NVIDIA_API marco

issue/474: rename Dequantize to DequantizeAWQ in nvidia gpu

Added NeoX support to Cambricon RoPE; Added a missing argument in the profiling script;

issue/477 - Cambricon MLU NeoX

* issue/472: p800 ccl * issue/472: 删掉无用操作 * issue/472: fix format * issue/472: memcpy h2h case

ma-hang and others added 30 commits September 15, 2025 11:49

add opencl runtime

c30a45b

add opencl handle, add rms_norm

76179c3

fix：add opencl init

fb715a7

issue/450: change indexToReducedOffset() to indexToOffset in elementw…

5e581b8

…ise framework on CPU, NVIDIA, Cambricon, Metax, Moore, and Kunlun

issue/450: remove indexToReducedOffset() in all platforms

9ef02a1

issue/450: add the testcases that pinpoint the issue in infiniop-test

9db54b8

issue/428: merge rope_v2 into rope with algorithm selection

8651576

issue/428: accommodate the changes to c/gguf tests

f6e8476

issue/428: update the rope implementation on Ascend, Cambricon, and K…

9f0ae73

…unlun to use the refactored interface and return unimplemented error for NEOX-style algorithm

Merge pull request InfiniTensor#429 from InfiniTensor/issue/428_merge…

f9d1662

…_rope_and_rope_v2 Issue/428: Merge `rope_v2` into `rope`

issue/434 hccl support bf16

b8609df

Signed-off-by: Ceng <441651826@qq.com>

fix rope_v2 compiling && update infiniccl_test

3bb0c93

Signed-off-by: Ceng <441651826@qq.com>

Merge pull request InfiniTensor#438 from InfiniTensor/issue/434-metax

b9dd000

issue/434 hccl support bf16

fix: disable topkrouter on Iluvatar GPU via ENABLE_NVIDIA_API macro

8c777f9

fix: disable topkrouter on Iluvatar GPU via ENABLE_NVIDIA_API macro (I…

1f50740

…nfiniTensor#457)

issue/410 Feature: Add infinicore python package

badccb8

issue/434 - added bf16 support for Cambricon MLU

94280d8

issue/436: support kunlun rope U32

6892a7f

issue/436: 支持9g7b 4b模型

3bdd832

issue/436：修补昆仑芯端到端推理遇到的问题 (InfiniTensor#437)

6680a8c

* issue/436: support kunlun rope U32 * issue/436: 支持9g7b 4b模型 --------- Co-authored-by: zhangyue <zhangyue@qiyuanlab.com>

issue/466: success kunlun rope NEOX

c15189b

Merge pull request InfiniTensor#462 from InfiniTensor/issue/434-cambr…

ade3b5d

…icon issue/434 - added bf16 support for Cambricon MLU

feat:hccl support bf16

d0b7bf9

Merge pull request InfiniTensor#467 from InfiniTensor/issue/466

2a81c8b

issue/466: 昆仑平台rope关于NEOX算法的实现

Issue/459 (InfiniTensor#460)

3a91947

* issue/459 - Support more data type combinations * issue/459 - added test cases for 9G7B and 9G70B * issue/459 - modified rms kernel to support larger tensors

issue/458 add AWQ dequantization torch test and improve variable nami…

82b2a84

…ng readability

fix: disable NVIDIA-dequantize on Iluvatar GPU via ENABLE_NVIDIA_API …

be117fe

…macro

Merge pull request InfiniTensor#470 from InfiniTensor/issue/469

d3d982d

issue/469: disable NVIDIA-dequantize on Iluvatar GPU via ENABLE_NVIDIA_API marco

feat: rename Dequantize to DequantizeAWQ in nvidia gpu

4217976

PanZezhong1725 and others added 20 commits September 24, 2025 09:42

Merge pull request InfiniTensor#476 from InfiniTensor/issue/474

6b903fd

issue/474: rename Dequantize to DequantizeAWQ in nvidia gpu

add mul

718a126

issue/477 - Cambricon MLU NeoX

6af2e42

Added NeoX support to Cambricon RoPE; Added a missing argument in the profiling script;

stash

3fb3b2f

Merge pull request InfiniTensor#478 from InfiniTensor/issue/477

20a2dbd

issue/477 - Cambricon MLU NeoX

Issue/472: 接入昆仑芯通信库 (InfiniTensor#479)

3959c94

* issue/472: p800 ccl * issue/472: 删掉无用操作 * issue/472: fix format * issue/472: memcpy h2h case

add gemm,causal_softmax for opencl

5a196e0

add rope and random_sample

53c4d53

rearrange,swiglu

1cfb7ba

fixed dequantized_awq

33794bc

Merge remote-tracking branch 'infini_tensor/main' into opencl-trm

8fb4205

merge infini_tensor/main

ec72705

可以运行推理的opencl算子

262a8ca

update gemm add sub group

8201655

update

2b0f34b

添加测试截图和完成工作说明

a515d16

删除无用输出

3ef1628

修复误删

26cfff1

缓存算子

f2ef58c

update rearrange

7a6e2d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Opencl trm #1

Opencl trm #1

Uh oh!

TRM-coding commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Opencl trm #1

Are you sure you want to change the base?

Opencl trm #1

Uh oh!

Conversation

TRM-coding commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants