Skip to content

Add support to setting known_block_size#267

Merged
coderfeli merged 4 commits intomainfrom
jruan/known_block_size
Mar 25, 2026
Merged

Add support to setting known_block_size#267
coderfeli merged 4 commits intomainfrom
jruan/known_block_size

Conversation

@ruanjm
Copy link
Contributor

@ruanjm ruanjm commented Mar 23, 2026

This PR is for for supporting workgroup size 512. The AMDGPU backend defaults to max_flat_workgroup_size = 256.

Copilot AI review requested due to automatic review settings March 23, 2026 08:09
@ruanjm ruanjm added the enhancement New feature or request label Mar 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional known_block_size plumbing path from the @kernel decorator through KernelFunction to the emitted gpu.func, enabling AMDGPU backends to derive max_flat_workgroup_size for larger workgroups (e.g., 512 threads).

Changes:

  • Add known_block_size parameter to create_gpu_func() and forward it to gpu.GPUFuncOp.
  • Extend KernelFunction / @kernel decorator API to accept and store known_block_size.
  • Pass stored known_block_size when emitting the kernel gpu.func and document usage in the decorator docstring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderfeli
Copy link
Collaborator

@ruanjm CI failed.

@ruanjm ruanjm force-pushed the jruan/known_block_size branch from 42d953a to 5af360e Compare March 24, 2026 03:57
@ruanjm
Copy link
Contributor Author

ruanjm commented Mar 24, 2026

@ruanjm CI failed.

fixed.

@coderfeli coderfeli merged commit 4633681 into main Mar 25, 2026
8 checks passed
@ruanjm ruanjm deleted the jruan/known_block_size branch March 25, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants