Add TMA unittest app #58

William-An · 2025-07-13T16:41:03Z

Add TMA unittest apps
Add GMMA ubench
Enable dynamic linking of CUDA runtime by default for all apps
Add support for parallel NVCC compilation while retaining embedded PTX code
Drop parboil as it is using python2
Also fix Error while generating traces for GPU_Microbenchmark accel-sim-framework#491

tgrogers

This all looks good, but should we put it in the uBench folder?

William-An · 2026-01-08T18:46:37Z

This CI build may take some time to build all the GEMM kernels for GMMA. Perhaps we should build a single kernel instead from the test source files (i.e., lat_gmma_test.cu and MaxFlops_gmma_test.cu)?

Or let the cluster build this?

Copilot

Pull request overview

This PR adds new GPU microbenchmarking capabilities for TMA and GMMA operations while fixing exit codes and enabling dynamic CUDA runtime linking. The changes modernize the build system to support parallel compilation and add comprehensive test coverage for Hopper architecture (SM90) matrix operations.

Key Changes:

Corrected return values from 1 to 0 for successful execution across all microbenchmark programs
Added GMMA (General Matrix Multiply-Accumulate) latency and throughput microbenchmarks with comprehensive test coverage for multiple data types
Enabled parallel NVCC compilation while preserving embedded PTX code generation

Reviewed changes

Copilot reviewed 299 out of 331 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
Multiple `*.cu` files	Fixed incorrect return value from `1` to `0` for successful program completion
`lat_gmma/` directory	Added comprehensive GMMA latency microbenchmark infrastructure with support for F32, F16, and INT32 accumulators
`MaxFlops_gmma/` directory	Added GMMA throughput microbenchmark kernels for various data type combinations
`lat_gmma_common.h`	Introduced shared kernel templates and helper macros for GMMA latency testing
`lat_gmma.h`	Defined function declarations for 385 different GMMA test configurations
`lat_gmma/Makefile`	Configured build system for SM90a architecture with C++17 and parallel compilation
`.gitignore`	Added ignore patterns for build artifacts (`.a` and `.ptx` files)

Comments suppressed due to low confidence (1)

src/cuda/GPU_Microbenchmark/ubench/core/lat_gmma/lat_gmma_common.h:1

Corrected grammar from 'Simple a test kernel' to 'Simple test kernel'.

/***************************************************************************************************

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cuda/GPU_Microbenchmark/ubench/core/lat_gmma/lat_gmma.cu

William-An · 2026-01-08T22:43:44Z

CI is failing due to insufficient space on the runner.

William-An marked this pull request as draft July 13, 2025 16:41

tgrogers approved these changes Jul 15, 2025

View reviewed changes

William-An and others added 7 commits October 16, 2025 22:05

Add tma unittest

bea007b

add regular load to TMA benchmark

e645411

make the regular load to have same access pattern as TMA load

73bfa8f

avoid compiler optimization

646517d

move cuda mempcy to be before kernel launch

9daed3a

add iteration count for tma ubench

8d7937e

minor formatting

31a54b6

William-An force-pushed the add_tma branch from 93159f4 to 31a54b6 Compare October 17, 2025 02:07

William-An added 10 commits October 17, 2025 11:35

move tma to ubench folder

3b365af

make setup script works with zsh

3525929

fix the issue that ubench all return 1 even without issue

bda12c2

add a sample test kernel for mbarrier PTX mapping to SASS

a7d5b0e

update gitignore

b7f2552

add gmma kernels for latency measurement

390ff6e

increase iter to 1024

433324c

add missed kernels

86163a7

add maxflops for gmma

eb1372f

update block size

efe7108

William-An force-pushed the add_tma branch from 37bddca to efe7108 Compare November 1, 2025 05:10

William-An added 9 commits November 2, 2025 11:31

update prints for MaxFlops_gmma

8f96178

fix a bug

efb18e5

fix include after updating it

3d7ad80

fix for cpp and c source

f933e19

fix compile

d48d603

fix for pattern matching

df168f3

fix compilation for mbarrier

76b7948

Fix makefile for tma app

ba70262

generate SASS and PTX for TMA and GMMA workloads

6bc9197

William-An added 6 commits November 3, 2025 18:02

update makefile to force PTX to be embedded in final fat bin

e188546

change naming

07b7aba

comment out parboil as it is using python2

8703f9a

Add GPU ubench to clean target

11acbdb

Use dynamic linking by default for GPU apps

e0421b2

Add test binaries for GMMA instruction

fa8935f

William-An marked this pull request as ready for review January 8, 2026 18:34

Checkout CUTLASS during ci

aac152e

William-An requested review from Copilot and tgrogers January 8, 2026 18:46

Copilot AI reviewed Jan 8, 2026

View reviewed changes

src/cuda/GPU_Microbenchmark/ubench/core/lat_gmma/lat_gmma.cu Outdated Show resolved Hide resolved

William-An added 2 commits January 8, 2026 15:15

Use type to specify gmma ubench iteration count and update test code

65e6057

Fix typos

0034f35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TMA unittest app #58

Add TMA unittest app #58

Uh oh!

William-An commented Jul 13, 2025 •

edited

Loading

Uh oh!

tgrogers left a comment

Uh oh!

William-An commented Jan 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

William-An commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add TMA unittest app #58

Are you sure you want to change the base?

Add TMA unittest app #58

Uh oh!

Conversation

William-An commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgrogers left a comment

Choose a reason for hiding this comment

Uh oh!

William-An commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes:

Reviewed changes

Uh oh!

Uh oh!

William-An commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

William-An commented Jul 13, 2025 •

edited

Loading

William-An commented Jan 8, 2026 •

edited

Loading