feat: optimal `BLOCK_SIZE` for cuda kernel and support Iluvatar `SwigLU` by zhangyue207 · Pull Request #17 · InfiniTensor/InfiniOps

zhangyue207 · 2026-03-11T06:36:51Z

feat
- Support QueryMaxThreadsPerBlock for Iluvatar platform
- Use optimal BLOCK_SIZE in cuda kernels
- Adapt SwiGLU for Iluvatar platform
refactor
- Seperate Add device code and kernel launcher

… caching - Implemented `operator==` and `operator!=` for the `Device` class to facilitate comparison. - Introduced `CacheKey` struct in `operator.h` to enhance caching mechanism with a hash and vector of tensors. - Updated the `Operator::call` method to utilize `CacheKey` for caching operators based on input arguments. - Added `MetaEqual` method in `Tensor` class for tensor comparison based on metadata.

… comparison - Changed the namespace of `CacheKey` to `infini::ops::detail` for better organization. - Updated the hash and equality operators for `CacheKey` to reflect the new namespace. - Removed the `MetaEqual` method from the `Tensor` class and replaced it with a dedicated `std::equal_to` specialization for `Tensor` to improve comparison logic.

…lity

- Moved CPU casting functions to a new file `common/cpu/cast.h` and updated the `Cast` function to utilize these utilities. - Updated CUDA kernel files to include the new casting utilities and improved block size handling in kernel launches. - Enhanced the `Add`, `CausalSoftmax`, `Gemm`, `RmsNorm`, and `Swiglu` operators to utilize the new casting mechanisms for better type handling. - Added support for additional data types in tests and adjusted test cases for consistency across CPU and GPU backends.

zhangyue207 changed the title ~~Feat/dev swiglu ilu~~ feat: optimal BLOCK_SIZE for cuda kernel and support Iluvatar SwigLU Mar 11, 2026

zhangyue207 added 5 commits March 11, 2026 08:02

style: remove unnecessary blank line in cublas.h for improved readabi…

774a58a

…lity

refactor: improve formatting

a6f3529

zhangyue207 force-pushed the feat/dev-swiglu-ilu branch from 6096e3f to a6f3529 Compare March 12, 2026 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimal `BLOCK_SIZE` for cuda kernel and support Iluvatar `SwigLU`#17

feat: optimal `BLOCK_SIZE` for cuda kernel and support Iluvatar `SwigLU`#17
zhangyue207 wants to merge 5 commits intofeat/dev-infrafrom
feat/dev-swiglu-ilu

zhangyue207 commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangyue207 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhangyue207 commented Mar 11, 2026 •

edited

Loading