Skip to content

[xegpu] Add matmul cost model and tile size selector#156

Open
tkarna wants to merge 4 commits into
llvm:mainfrom
tkarna:xegpu-costmodel
Open

[xegpu] Add matmul cost model and tile size selector#156
tkarna wants to merge 4 commits into
llvm:mainfrom
tkarna:xegpu-costmodel

Conversation

@tkarna
Copy link
Copy Markdown
Contributor

@tkarna tkarna commented May 20, 2026

Extends the XeGPU MLP/matmul schedule with a cost model that can be used to generate valid tile sizes for given (M, N, K) matmul shape.

  • Adds XeGPUSpecs: Object that contains GPU specifications required by the cost model.
  • Adds XeGPUParameterSelector: Param selector is now a class that uses XeGPUSpecs and can generate valid tile size configurations if (M, N, K) case is not found in the existing parameter JSON file.
  • mlp_schedule still takes a list of param dicts, one for each layer. Only "m", "n", "k" entries are required however; if any parameter is missing, XeGPUParameterSelector is called to populate the tile sizes.
  • Adds matmul cost model routines:
    • Given matmul shape (M, N, K), the cost model routine generate_configs generates valid workgroup, subgroup, and k tile size configurations and estimates their performance based on a simple roofline model. Returns configs sorted by estimated performance.
    • generate_prefetch_tiles generates all valid thread cooperative prefetch strategies, sorted by the number of cooperative threads. No performance estimate is provided.
    • Simple heuristic to generate tile sizes if they are not given: Take the best WG, SG, K configuration based on cost model estimate, take one of the prefetch configurations, and use the DPAS instruction shape for A and B load tiles.

Currently data types are assumed to be float16 and float32 for A/B and C, respectively. To be generalized later.

We can now execute any nicely-shaped matrix multiplication without the need to define tile sizes. If the matmul is compute-bound performance should be decent.

python matmul.py --sizes 512 8192 128 --check-result -v

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a XeGPU matmul tile-parameter cost model plus device-spec plumbing, and updates the XeGPU matmul/MLP scheduling path (and examples) to auto-populate missing tiling parameters when a shape is not present in the JSON parameter DB.

Changes:

  • Introduces XeGPUSpecs (device specs DB) and a matmul_costmodel grid-search/roofline estimator to rank valid tiling configs.
  • Replaces the old function-based parameter selector with XeGPUParameterSelector, and wires mlp_schedule to generate/fill missing tiling parameters per layer.
  • Refactors examples to pass only required (m,n,k) (plus optional --target) and rely on the schedule to complete parameters; reuses centralized constraint checks.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
lighthouse/schedule/xegpu/xegpu_specs.py Adds device-spec DB and XeGPUSpecs used by the cost model/selector.
lighthouse/schedule/xegpu/xegpu_parameter_selector.py Implements class-based param selection with JSON DB lookup + cost-model fallback.
lighthouse/schedule/xegpu/mlp_schedule.py Adds pre-processing to auto-fill missing layer tile parameters via selector; moves constants to shared constraints module.
lighthouse/schedule/xegpu/matmul_costmodel.py Adds config generation + simple roofline-based performance estimation.
lighthouse/schedule/xegpu/matmul_constraints.py Centralizes tiling/prefetch validity checks and shared constants.
lighthouse/schedule/xegpu/init.py Exposes new selector/specs/constraint helper via package exports.
examples/xegpu/tune_matmul_gridsearch.py Switches to shared check_constraints and adds GPU target selection.
examples/xegpu/torch_matmul.py Simplifies parameter init to (m,n,k) + optional target; removes legacy selector usage.
examples/xegpu/mlp.py Passes per-layer (m,n,k) (and optional target) and relies on schedule for completion.
examples/xegpu/matmul.py Passes (m,n,k) (and optional target) and relies on schedule for completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lighthouse/schedule/xegpu/xegpu_specs.py Outdated
Comment thread lighthouse/schedule/xegpu/xegpu_parameter_selector.py Outdated
Comment thread lighthouse/schedule/xegpu/mlp_schedule.py
Comment thread lighthouse/schedule/xegpu/mlp_schedule.py
Comment thread lighthouse/schedule/xegpu/matmul_costmodel.py
Comment thread lighthouse/schedule/xegpu/matmul_constraints.py Outdated
Comment thread examples/xegpu/tune_matmul_gridsearch.py Outdated
Comment thread examples/xegpu/torch_matmul.py
@tkarna tkarna force-pushed the xegpu-costmodel branch from e09cd97 to 62898c1 Compare May 20, 2026 17:24
@tkarna tkarna force-pushed the xegpu-costmodel branch from 62898c1 to 7fd5c21 Compare May 20, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants