PTX Backend by WillTrojak · Pull Request #18 · PyFR/GiMMiK

WillTrojak · 2026-05-15T12:23:17Z

This adds a PTX backend to GiMMiK. The key features are:

Mild optimisation of exist CUDA algorithms.
Optional async loads for some sparse kernels
Added dense generation for Hopper and above

Optimisations have focused on FP64, FP32 is future work.

FreddieWitherden · 2026-05-15T13:44:24Z

+            yield (tpl, args, meta)
+
+        # Warp-specialised dense DMMA
+        if cc >= (10, 0):


Does this gate consumer cards with less shared memory?

Not sure what the best way to handle this is. I've added a DENSE_SMEM_MAX but we could set this via the ini or driver?

If consumer cards can pass the check they need to work. Not sure if there is a clear mapping from CC to max smem. Otherwise, have the caller pass in additional info about max shared memory.

FreddieWitherden · 2026-05-15T13:54:35Z

@@ -0,0 +1,276 @@
+# -*- coding: utf-8 -*-
+
+import struct


FreddieWitherden · 2026-05-15T18:31:49Z

I know this is an utter pain but for FP32/FP64 can you confirm correctness for all relevant PyFR matrices at a suite of N values for all instances where a kernel is expected to work on A100/H100/B100)?

FreddieWitherden · 2026-05-15T18:33:25Z

+                         .param .u64 _c)
+{
+% endif
+    .reg .u32 n, id, tid_x, tid_y;


Ensure we throw higher up if n is too big.

Checking here

We don't handle n being too large in any of the other backends.

https://github.com/PyFR/GiMMiK/blob/master/gimmik/kernels/cuda/cstream.mako#L20 in the embedded case we do (argument case doesn't but that is not currently used for CUDA).

FreddieWitherden · 2026-05-15T18:36:01Z

+%       if afix[row_j] == -1:
+% if beta == 0:
+    {
+    .reg .${pftype} _tmp;


Can this be factored up as appears in both branches?

FreddieWitherden · 2026-05-21T13:29:40Z

+        nnz = np.count_nonzero(arr)
+        nuq = len(np.unique(np.abs(arr)))
+        density = nnz / arr.size
+        return (nuq <= 28) or (density <= 0.15)


Check if these could do with tuning

FreddieWitherden · 2026-05-22T15:28:41Z

+%   for idx, kx in enumerate(bchunks[bb]):
+    ld.shared.${pftype} bv, [bsub_thread + ${bsub_off(buf_cur, idx)}];
+%    for j, row_j in enumerate(mcx):
+<%    jx = A[row_j, kx] %>


See if NumPy can be used in the for loop A[mcx, kx]

FreddieWitherden · 2026-05-22T15:34:04Z

+    .reg .pred pm_${mt};
+    {
+        .reg .u32 crow;
+        add.u32 crow, r_div4, ${mt * 8};


Try to put constant first so 8*mt

Will Trojak and others added 6 commits December 2, 2025 22:13

[wip] added ptx generator for bstream

0cd7485

Addtional sparse and dense work

626c2f5

Dense and sparse optimisation

bbbb8ef

Added warp specialised dense kernel

393b409

Performance tuning and cleanup

67d1beb

Whitespace

e2a818b

WillTrojak mentioned this pull request May 15, 2026

Support for GiMMiK PTX Provider PyFR/PyFR#556

Open

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

@@ -0,0 +1,276 @@

# -*- coding: utf-8 -*-

import struct

Copy link
Copy Markdown

Contributor

FreddieWitherden May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/base.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/cstream-ksplit.mako Outdated

FreddieWitherden reviewed May 15, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream.mako

Cleanups, formating and addressign comments

7d7299a

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 19, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

General cleanups and moved smem to pyfr

1d405c3

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

FreddieWitherden reviewed May 21, 2026

View reviewed changes

Comment thread gimmik/ptx.py Outdated

WillTrojak added 3 commits May 21, 2026 09:26

Fixed missing import

0e86053

Fixed additional args

1f62b5f

Cleanup and added PTX Version to handle older drivers.

79f41cb

FreddieWitherden reviewed May 22, 2026

View reviewed changes

Comment thread gimmik/kernels/ptx/bstream-msplit.mako

FreddieWitherden reviewed May 22, 2026

View reviewed changes

Conversation

WillTrojak commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FreddieWitherden commented May 15, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants