Skip to content

Conversation

@brunomazzottiamd
Copy link
Contributor

@brunomazzottiamd brunomazzottiamd commented Dec 18, 2025

Motivation

Triton test step of AITER CI is taking too long and becoming a bottleneck for PR merging. We should programmatically select which Triton tests to run, based on the diff content of a given PR. The full test suite can be executed periodically on main branch only.

This PR proposes a Python script that given

  • a source branch with changes to be merged
  • a target branch to merge into
  • the AITER codebase

automatically selects which Triton tests validate the changes to be merged.

Technical Details

Script help text:

$ python .github/scripts/select_triton_tests.py --help
usage: select_triton_tests.py [-h] -s SOURCE [-t TARGET] [-l {critical,error,warning,info,debug,off}]

select which Triton tests to run based on git diff

options:
  -h, --help            show this help message and exit
  -s SOURCE, --source SOURCE
                        source branch
  -t TARGET, --target TARGET
                        target branch, defaults to main
  -l {critical,error,warning,info,debug,off}, --log-level {critical,error,warning,info,debug,off}
                        log level to enable (default: info)

Implementation details:

  • Uses Python subprocess module to run git commands and find out what has changed between source and target branches.
  • Uses Python pathlib module to list all Triton source files in AITER code base (kernels, kernel configurations, unit tests, benchmark scripts).
  • Uses Python ast module to recursively parse the Triton source files, tracking all dependency relations. These dependency relations can be between two source files or between a kernel configuration file and a kernel source file.
  • The dependency relations are encoded in a networkx directed graph. Later, the graph is traversed starting from the diff content until we reach unit tests. All unit tests that are reachable should be executed to validate the proposed changes.
  • No code is executed, the entire construction of the dependency graph is done through static analysis.

Test Plan

Tested locally with two scenarios:

  • Test case A - changed the following Triton source files:
    • aiter/ops/triton/_triton_kernels/mha.py → MHA forward kernel
    • aiter/ops/triton/_triton_kernels/mha_onekernel_bwd.py → MHA backward kernel
    • aiter/ops/triton/_triton_kernels/rope.py → RoPE kernel
    • aiter/ops/triton/configs/gfx942-GMM.json → GMM kernel configuration
    • op_tests/op_benchmarks/triton/bench_topk.py → Top-k benchmark script
  • Test case B - changed files other than Triton sources.

Test Result

Summarized script output:

Test case A:

2025-12-18 11:00:40,474|INFO|There are 5 Triton source files in the diff:
2025-12-18 11:00:40,474|INFO|* aiter\ops\triton\_triton_kernels\mha.py
2025-12-18 11:00:40,474|INFO|* aiter\ops\triton\_triton_kernels\mha_onekernel_bwd.py
2025-12-18 11:00:40,474|INFO|* aiter\ops\triton\_triton_kernels\rope.py
2025-12-18 11:00:40,474|INFO|* aiter\ops\triton\configs\gfx942-GMM.json
2025-12-18 11:00:40,474|INFO|* op_tests\op_benchmarks\triton\bench_topk.py
2025-12-18 11:00:47,450|INFO|There are 7 tests reachable from the Triton diff:
2025-12-18 11:00:47,450|INFO|* op_tests\triton_tests\attention\test_mha.py
2025-12-18 11:00:47,450|INFO|* op_tests\triton_tests\fusions\test_fused_kv_cache.py
2025-12-18 11:00:47,450|INFO|* op_tests\triton_tests\fusions\test_fused_qk_concat.py
2025-12-18 11:00:47,450|INFO|* op_tests\triton_tests\rope\test_fused_qkv_split_qk_rope.py
2025-12-18 11:00:47,450|INFO|* op_tests\triton_tests\rope\test_rope.py
2025-12-18 11:00:47,450|INFO|* op_tests\triton_tests\test_gmm.py
2025-12-18 11:00:47,450|INFO|* op_tests\triton_tests\test_topk.py
2025-12-18 11:00:47,450|INFO|Finished, execution took 7.41 seconds.

Test case B:

2025-12-18 11:15:11,598|INFO|There are no Triton source files in diff, there's no need to run Triton tests.

TODO before merging

  • Address all comments made by Bruno Mazzotti, the PR author.
  • Find a way to integrate the script with CI infrastructure and GitHub Actions. I think AITER CI team (Xin Huang + Leonid Drozdov) can help with this.

IMPORTANT NOTICE!

  • The sole goal of this PR is to just run the script on CI, on every subsequent PR to be merged.
  • Under no circumstances, should any script execution error be considered a CI failure.
  • For now, we should not trust the script output, just monitor it and fix possible issues over time. Only when we are confident in the result, should it be used to select which tests to run.
  • Another must have condition to use the script to drive test selection is to have a periodic run of the full Triton test suite on main branch.

Submission Checklist

@brunomazzottiamd brunomazzottiamd self-assigned this Dec 18, 2025
@brunomazzottiamd brunomazzottiamd added enhancement New feature or request triton labels Dec 18, 2025
Copy link
Contributor Author

@brunomazzottiamd brunomazzottiamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my comments need to be resolved before merging this PR.

from cpp_extension import _jit_compile, get_hip_version
from file_baton import FileBaton
from torch_guard import torch_compile_guard # noqa: E402
from .utils.chip_info import get_gfx, get_gfx_list
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME: Changes in aiter/jit/core.py broke AITER compilation,

from pathlib import Path

# Third party libraries.
import networkx as nx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

networkx is a third-party dependency. How can we be sure to have it installed before running the script?



def git_filename_diff(source_branch: str, target_branch: str) -> set[Path]:
# FIXME: The statement bellow only works if the script is invoked in AITER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve this FIXME comment before merging.

@brunomazzottiamd brunomazzottiamd force-pushed the bmazzott/run-triton-tests-selectively branch from 242d8f4 to c26fc58 Compare January 5, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request triton

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants