Adding dispatcher architecture #3300

vidyasagar-amd · 2025-11-26T05:00:56Z

Proposed changes

This PR introduces new dispatching infrastructure for CK Tile, building on prior Tile Engine work, allowing users to isolate and run specific kernels using C++ and Python APIs for GEMMs (universal, preshuffle and multi-D variants). It also adds unified code-generation tools, GPU architecture based kernel filtering, a kernel registry handling mechanism, and a set of examples on how to integrate with other frameworks (C++/Python) together with basic unit and integration tests.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

spolifroni-amd

The readmes etc look fine. I can't tell if this is supposed to be used internally for those contributing to the project, or for anyone using the library. If it's for anyone, then a changelog entry is needed to talk about arch_specs.json. Otherwise this is fine.

afagaj

It's worth adding a CHANGELOG.md entry to announce this change.

Copilot

Pull request overview

Introduce CK Tile Dispatcher architecture with new C++ and Python APIs, codegen tooling, kernel registry, and comprehensive GEMM/Conv examples, plus basic validation and benchmarking.

Add GEMM and Convolution examples in Python and C++ demonstrating registry/dispatcher usage, validation, and benchmarking.
Provide shared Python utilities for convolution (conv_utils.py) and codegen assets (requirements, scripts).
Expand documentation (READMEs) for quick start and example overviews.

Reviewed changes

Copilot reviewed 74 out of 160 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
dispatcher/examples/gemm/python/README.md	Adds quick start and Python GEMM examples overview and usage.
dispatcher/examples/gemm/python/01_basic_gemm.py	Basic GEMM example showing manual workflow (config, codegen, registry, dispatch).
dispatcher/examples/gemm/python/02_batch_gemm.py	Demonstrates running batches of GEMM problems with padding support.
dispatcher/examples/gemm/python/03_benchmark.py	Adds benchmarking script with warmup and TFLOPS reporting.
dispatcher/examples/gemm/python/04_validation.py	Validates GPU GEMM outputs against NumPy references with tolerances.
dispatcher/examples/gemm/python/05_numpy_integration.py	NumPy integration wrapper (GPUMatmul) and small demos.
dispatcher/examples/gemm/python/06_json_export.py	Exports registry/kernels metadata to JSON (and consumes C++ JSON if present).
dispatcher/examples/gemm/python/07_preshuffle.py	Preshuffle pipeline example with larger tiles and intrawave scheduler.
dispatcher/examples/gemm/python/08_multi_d.py	Multi-D fused GEMM example with CPU simulation and GPU base op timing.
dispatcher/examples/gemm/python/09_multi_registry.py	Multiple registries for compute/memory/latency optimized workloads.
dispatcher/examples/gemm/cpp/README.md	Adds C++ GEMM examples quick start and overview of example set.
dispatcher/examples/gemm/cpp/01_basic_gemm.cpp	Declarative kernel set example and simple GEMM run/verify.
dispatcher/examples/gemm/cpp/02_multi_size.cpp	Runs multiple problem sizes and reports TFLOPS.
dispatcher/examples/gemm/cpp/03_benchmark.cpp	Benchmark runner with stats (min/median/mean).
dispatcher/examples/gemm/cpp/04_validation.cpp	CPU reference and validation against GPU results.
dispatcher/examples/gemm/cpp/05_heuristics.cpp	Custom heuristic-based kernel selection demonstration.
dispatcher/examples/gemm/cpp/06_json_export.cpp	Registry JSON export and kernel set declarations.
dispatcher/examples/gemm/cpp/07_preshuffle.cpp	Preshuffle GEMM example with verification.
dispatcher/examples/gemm/cpp/08_multi_d.cpp	Multi-D fused concept demo using standard GEMM run.
dispatcher/examples/gemm/cpp/09_multi_registry.cpp	Multiple registries and dispatchers plus summary pattern.
dispatcher/examples/conv/python/conv_utils.py	Core Python utilities for conv signature/algorithm/arch, codegen, runners, and validation.
dispatcher/examples/conv/python/README.md	Adds Python Conv examples quick start and detailed guide.
dispatcher/examples/conv/python/02_conv2d_fwd.py	2D forward conv example with optional CPU verification and GPU run.
dispatcher/examples/conv/python/03_conv3d_fwd.py	3D forward conv example with CPU reference and GPU run.
dispatcher/examples/conv/python/04_conv2d_bwd_data.py	2D backward data conv with CPU reference and optional GPU execution.
dispatcher/examples/conv/python/05_conv2d_bwd_weight.py	2D backward weight conv with CPU reference and optional GPU execution.
dispatcher/examples/conv/python/06_benchmark.py	Conv benchmarking across small problems; optional CPU reference.
dispatcher/examples/conv/python/07_validation.py	Conv validation suite vs CPU with detailed analysis.
dispatcher/examples/conv/python/08_json_export.py	Export conv registry to JSON with basic stats and examples.
dispatcher/examples/conv/python/09_multi_registry.py	Multiple registries for conv workloads with selection heuristics.
dispatcher/examples/conv/python/10_conv3d_forward.py	3D forward conv GPU runner with timing and TFLOPS.
dispatcher/examples/conv/python/11_bwd_data.py	Backward data API demo with runners (note: codegen in progress).
dispatcher/examples/conv/python/12_bwd_weight.py	Backward weight API demo with dedicated runner (separate lib).
dispatcher/codegen/requirements.txt	Adds Python deps for codegen and optional tooling.
dispatcher/codegen/minimal_test_config.json	Minimal tile/trait config for test kernel generation.
dispatcher/codegen/generate_test_kernels.sh	Shell script to generate a minimal set of CK Tile kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dispatcher/examples/gemm/python/05_numpy_integration.py

dispatcher/examples/conv/python/conv_utils.py

dispatcher/examples/conv/python/10_conv3d_forward.py

dispatcher/examples/conv/python/06_benchmark.py

dispatcher/examples/gemm/python/README.md

dispatcher/examples/gemm/cpp/README.md

dispatcher/examples/conv/python/README.md

dispatcher/examples/conv/python/conv_utils.py

tenpercent · 2025-12-01T20:13:56Z

Thanks! A few notes while I'm just starting to look -

all cmake flags seem to be necessary, otherwise it quietly completes the build without building the examples, which is a bit surprising
maybe use Ninja instead of GNU Make if we want to encourage that
There is a fair number of legit warnings when you build the C++ example. Consider adding -Werror to the default clang flags
make sure the binaries and python scripts correctly process --help flag and output useful info
make conv example consistent with gemm
for the functionality you don't want to be broken by changes in CK APIs, add building and testing to the CI
at least all instructions in READMEs need to be manually verified

dispatcher/README.md

tenpercent · 2025-12-01T20:25:21Z

dispatcher/README.md

+| CMake | 3.16+ | `cmake --version` |
+| Python | 3.8+ | `python3 --version` |
+| NumPy | Any | `pip show numpy` |
+| hipcc | (from ROCm) | `/opt/rocm/bin/hipcc --version` |


I think some time around rocm 6.4 we were encouraged to switch from hipcc to clang bundled with the ROCm distribution

dispatcher/README.md

tenpercent · 2025-12-01T20:26:23Z

dispatcher/README.md

+- **gfx942** - MI300X, MI300A (Instinct MI300 series) ← Recommended
+- **gfx950** - MI350 series
+- **gfx90a** - MI200 series (MI250, MI250X)
+- **gfx1201** - RDNA4 series


make sure this is consistent with the examples. I think currently the printed messages are about gfx11

added gfx11

tenpercent · 2025-12-01T20:27:21Z

dispatcher/README.md

+
+```bash
+# Install NumPy (required for Python examples)
+pip install numpy


one more good practice to encourage - use uv venv for creating virtual environments and uv pip for the packages

dispatcher/README.md

Fixing typos

…preshuffle bug.

…lation.

vidyasagar-amd requested review from a team, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, ddembeckAMD, geyyer, illsilin, poyenc, qianfengz, shumway and tenpercent as code owners November 26, 2025 05:00

spolifroni-amd previously approved these changes Nov 26, 2025

View reviewed changes

afagaj reviewed Nov 26, 2025

View reviewed changes

vidyasagar-amd dismissed spolifroni-amd’s stale review via 34c5579 November 28, 2025 19:16

vidyasagar-amd requested a review from Copilot December 1, 2025 17:46

Copilot AI reviewed Dec 1, 2025

View reviewed changes

tenpercent reviewed Dec 1, 2025

View reviewed changes

dispatcher/README.md Outdated Show resolved Hide resolved

tenpercent reviewed Dec 1, 2025

View reviewed changes

dispatcher/README.md Outdated Show resolved Hide resolved

tenpercent reviewed Dec 1, 2025

View reviewed changes

dispatcher/README.md Outdated Show resolved Hide resolved

tenpercent reviewed Dec 1, 2025

View reviewed changes

dispatcher/README.md Outdated Show resolved Hide resolved

vidyasagar-amd added 28 commits January 12, 2026 18:44

Cleaning up code

67f4f05

Improving dispatcher support for different arch

4938c89

Fixing typos

Fix formatting errors

eeb1289

Cleaning up examples

7cc69e1

Improving codegeneration

da18876

Improving and fixing C++ examples

9e62753

Adding conv functionality (fwd,bwd,bwdw) and examples.

3f70fb3

Fixes based on feedback.

8817a1c

Further fixes based on feedback.

064e056

Adding stress test for autogeneration and autocorrection, and fixing …

c2c80d6

…preshuffle bug.

Another round of improvements based on feedback.

9cf9844

Trimming out unnecessary code.

fd617cf

Fixing the multi-D implementation.

93b66b1

Using gpu verification for gemms and fixing convolutions tflops calcu…

6041807

…lation.

Fix counter usage issue and arch filtering per ops.

af839ac

Adding changelog and other fixes.

8b8f9f8

Improve examples and resolve critical bugs.

d8a30a4

Reduce build time for python examples.

d89f06c

Fixing minor bug.

9bb2366

Fix compilation error.

446624c

Improve installation instructions for dispatcher.

85a6bcd

Add docker based installation instructions for dispatcher.

94cc05a

Fixing arch-based filtering to match tile engine.

e090841

Remove dead code and fix arch filtering.

148565a

Minor bugfix.

8497dab

Updates after rebase.

bae640e

Trimming code.

61c6826

Fix copyright headers.

53d61e0

vidyasagar-amd force-pushed the builder-dispatch-tile-gemm branch from 1ec1d0d to 53d61e0 Compare January 12, 2026 18:45

Consolidate examples, cut down code.

7a95bb6

Adding dispatcher architecture #3300

Are you sure you want to change the base?

Adding dispatcher architecture #3300

Conversation

vidyasagar-amd commented Nov 26, 2025

Proposed changes

Checklist

Discussion

Uh oh!

spolifroni-amd left a comment

Choose a reason for hiding this comment

Uh oh!

afagaj left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tenpercent commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tenpercent Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tenpercent Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

vidyasagar-amd Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

tenpercent Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

vidyasagar-amd Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tenpercent commented Dec 1, 2025 •

edited

Loading