Skip to content

feat: optimal BLOCK_SIZE for cuda kernel and support Iluvatar SwigLU#17

Open
zhangyue207 wants to merge 5 commits intofeat/dev-infrafrom
feat/dev-swiglu-ilu
Open

feat: optimal BLOCK_SIZE for cuda kernel and support Iluvatar SwigLU#17
zhangyue207 wants to merge 5 commits intofeat/dev-infrafrom
feat/dev-swiglu-ilu

Conversation

@zhangyue207
Copy link

@zhangyue207 zhangyue207 commented Mar 11, 2026

  • feat
    • Support QueryMaxThreadsPerBlock for Iluvatar platform
    • Use optimal BLOCK_SIZE in cuda kernels
    • Adapt SwiGLU for Iluvatar platform
  • refactor
    • Seperate Add device code and kernel launcher

@zhangyue207 zhangyue207 changed the title Feat/dev swiglu ilu feat: optimal BLOCK_SIZE for cuda kernel and support Iluvatar SwigLU Mar 11, 2026
… caching

- Implemented `operator==` and `operator!=` for the `Device` class to facilitate comparison.
- Introduced `CacheKey` struct in `operator.h` to enhance caching mechanism with a hash and vector of tensors.
- Updated the `Operator::call` method to utilize `CacheKey` for caching operators based on input arguments.
- Added `MetaEqual` method in `Tensor` class for tensor comparison based on metadata.
… comparison

- Changed the namespace of `CacheKey` to `infini::ops::detail` for better organization.
- Updated the hash and equality operators for `CacheKey` to reflect the new namespace.
- Removed the `MetaEqual` method from the `Tensor` class and replaced it with a dedicated `std::equal_to` specialization for `Tensor` to improve comparison logic.
- Moved CPU casting functions to a new file `common/cpu/cast.h` and updated the `Cast` function to utilize these utilities.
- Updated CUDA kernel files to include the new casting utilities and improved block size handling in kernel launches.
- Enhanced the `Add`, `CausalSoftmax`, `Gemm`, `RmsNorm`, and `Swiglu` operators to utilize the new casting mechanisms for better type handling.
- Added support for additional data types in tests and adjusted test cases for consistency across CPU and GPU backends.
@zhangyue207 zhangyue207 force-pushed the feat/dev-swiglu-ilu branch from 6096e3f to a6f3529 Compare March 12, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant