Suggestion Description
RFC: Test tiering and multi-backend CI matrix
Summary
This RFC defines a layered testing model for FlyDSL as the project supports multiple compile backends and (per RFC : Device and runtime layer (aligned with pluggable compile backends)) paired device runtimes. The goals are:
- Clear dependencies — which tests run without MLIR, which need target dialects but no GPU, which need real hardware.
- Downstream-friendly CI — contributors without vendor-specific MLIR builds (e.g. dialects not yet in upstream LLVM) can still run a large, meaningful subset of tests.
- Single configuration surface — tests select behavior via environment variables and pytest options that match existing FlyDSL
env modules (community source of truth).
This document is the testing counterpart to the compile-backend RFC (#253) and device/runtime RFC (#255); it does not redefine those mechanisms.
Motivation
- RFC: Pluggable GPU compile backend (Python layer) introduced
FLYDSL_COMPILE_BACKEND, get_backend(), and register_backend(). Tests today are largely implicitly ROCm/ROCDL and do not declare backend or tier.
- Vendor backends (e.g. Iluvatar) may ship MLIR dialects (e.g.
ixdl) that cannot be imported in a generic LLVM/FlyDSL CI image. Tests that require such dialects must be separable from tests that only need the Fly dialect and upstream/common dialects.
- Hardware labs are expensive; compile-only and IR/FileCheck tests should run on every PR without a GPU.
Non-goals
- Specifying the full pytest directory layout migration in one step (markers and docs can land first).
- Defining concrete CUDA or
ixdl backend implementations.
Test tiers
L0 — Backend-agnostic
- No assumption about compile backend id or device runtime kind.
- No requirement for vendor target dialects (
rocdl, ixdl, nvvm, …).
- Typical content: pure Python helpers, AST/parser tests, registry logic that does not execute target-specific pipelines.
L1 — Compile-tier, no device execution
Tests that exercise compilation or MLIR passes but do not launch kernels on a GPU (no ExecutionEngine execution path that requires a live device for correctness).
L1 is split into L1a and L1b:
L1a — No vendor target dialect
- Must not depend on registering or importing vendor-specific target dialects not guaranteed in a minimal FlyDSL + upstream LLVM build.
- Pass pipelines and IR stay in Fly and portable dialects (exact allow-list is project-defined; e.g.
arith, func, memref where applicable).
Rationale: environments that cannot import or register ixdl (or similar) still run L0 + L1a in CI.
L1b — Target dialect / backend-specific lowering
- May run passes or hold IR that uses target-specific representations (e.g.
convert-fly-to-rocdl, #rocdl.target, or a future ixdl pipeline).
- Still no GPU execution for correctness (may use
COMPILE_ONLY=1, FileCheck, or PassManager verification only).
Rationale: ROCm-focused CI runs L1b-rocm; a partner lab runs L1b-ixdl when that build is available. Neither tier is required for every contributor machine.
L2 — Device-tier
- Requires GPU + driver + runtime stack (and often PyTorch) for launch, numerical checks, or performance harnesses.
- Should be tagged with the runtime or hardware family when useful (e.g. ROCm HIP).
Relationship to prior RFCs
Environment variables (community source of truth)
Test harnesses and documentation must use the same names as python/flydsl/utils/env.py. Do not introduce parallel spellings in docs or CI.
| Purpose |
Variable |
Notes |
| Compile backend id |
FLYDSL_COMPILE_BACKEND |
Default rocm; drives get_backend(). |
| Override target arch for compile |
ARCH |
Maps to env.compile.arch (not FLYDSL_COMPILE_ARCH). |
| Compile without execution |
COMPILE_ONLY |
Maps to env.compile.compile_only (not FLYDSL_COMPILE_ONLY). |
| JIT cache directory |
FLYDSL_RUNTIME_CACHE_DIR |
|
| Enable/disable JIT cache |
FLYDSL_RUNTIME_ENABLE_CACHE |
Use 0/false to disable (there is no FLYDSL_NO_CACHE in env). |
| IR dump |
FLYDSL_DUMP_IR, FLYDSL_DUMP_DIR |
Debug options. |
| ROCm arch hints (detection) |
FLYDSL_GPU_ARCH, HSA_OVERRIDE_GFX_VERSION |
Used by runtime/device.py / ROCm backend detection; distinct from ARCH. |
Note: Issue #253 body historically mentioned FLYDSL_COMPILE_ARCH; the implemented community env uses ARCH for compile arch override. This RFC follows code as source of truth.
Session-level pytest options (if implemented) should set these variables at session start, e.g.:
--flydsl-compile-backend → FLYDSL_COMPILE_BACKEND
--flydsl-compile-arch → ARCH
Default behavior when unset remains community defaults (backend rocm).
Pytest markers
Register in pytest.ini (names illustrative; finalize in implementation PR):
markers =
l0_backend_agnostic: no compile/runtime backend assumption
l1a_compile_no_target_dialect: MLIR compile path without vendor target dialects
l1b_target_dialect: requires a target lowering stack (further narrow with dialect-specific markers)
l2_device: requires GPU / full runtime
rocm_lower: L1b/L2 tests that assume ROCDL lowering path
# Future: ixdl_lower, cuda_lower, …
Rules:
l1b_target_dialect tests should add a second marker when they assume a specific stack (rocm_lower, future ixdl_lower, …) so CI can -m "not ixdl_lower" on generic runners.
l2_device may combine with rocm_lower (or equivalent) where relevant.
Directory layout (evolutionary)
Phase 1 (recommended): keep existing tests/kernels, tests/mlir, tests/unit, …; annotate files/classes with markers; document tier mapping in tests/README.md.
Phase 2 (optional): physical split, e.g. tests/l0/, tests/l1_compile/, tests/l2_device/, or nest tests/mlir/rocm/Conversion/ for L1b clarity.
Physical moves are not required to adopt this RFC’s semantics.
CI matrix (sketch)
| Job |
Typical selection |
Hardware |
| Minimal |
l0_backend_agnostic or l1a_compile_no_target_dialect |
CPU only |
| Standard compile |
above + l1b_target_dialect and rocm_lower |
CPU; MLIR build with ROCDL |
| Partner / vendor |
l1b_target_dialect and ixdl_lower (example) |
CPU; custom MLIR with ixdl |
| GPU |
l2_device (and backend markers) |
ROCm (or other) runner |
Migration plan
- Add markers to
pytest.ini and a short tests/README.md describing L0/L1a/L1b/L2 and env vars.
- Label high-traffic directories first (
kernels/ → l2_device; mlir/Conversion/ → l1b + rocm_lower; layout algebra MLIR → l1a where applicable).
- Fix
docs/cute_layout_algebra_guide.md §9.2 per Appendix A (and align other user-facing tables with env.py).
- Introduce optional pytest CLI in
tests/conftest.py mapping to FLYDSL_COMPILE_BACKEND and ARCH.
- Optionally restructure directories in follow-up PRs.
Open questions
- Exact allow-list of dialects/passes for L1a (maintainer-maintained table vs “anything not in a deny-list”).
- Whether LIT tests get pytest markers via a small wrapper or remain CMake/lit-only with a parallel tier table in
tests/README.md.
- Whether to rename
ARCH / COMPILE_ONLY to prefixed forms in a future env RFC (would require deprecation); until then, tests and docs follow current community names.
Appendix A — Incorrect variable names in docs/cute_layout_algebra_guide.md (§9.2)
File: docs/cute_layout_algebra_guide.md
Section: §9.2 Environment Variables (table around lines 411–418).
The following rows do not match python/flydsl/utils/env.py and should be corrected (or removed) when aligning documentation with the community implementation:
| Line (approx.) |
Document currently says |
Problem |
Correct / canonical (env.py) |
| 416 |
FLYDSL_COMPILE_ONLY=1 |
Variable name does not exist. |
COMPILE_ONLY=1 — compile_only uses env_var="COMPILE_ONLY". |
| 417 |
FLYDSL_NO_CACHE=1 |
No such variable in env. |
Disable JIT cache with FLYDSL_RUNTIME_ENABLE_CACHE=0 (or false, per OptBool parsing). |
Other rows in the same table (ARCH, FLYDSL_DUMP_IR, FLYDSL_DUMP_DIR, FLYDSL_RUNTIME_CACHE_DIR) are consistent with env.py. The table also omits FLYDSL_COMPILE_BACKEND (from RFC #253 / env.compile.backend); consider adding it for completeness.
Operating System
No response
GPU
No response
ROCm Component
No response
Suggestion Description
RFC: Test tiering and multi-backend CI matrix
Summary
This RFC defines a layered testing model for FlyDSL as the project supports multiple compile backends and (per RFC : Device and runtime layer (aligned with pluggable compile backends)) paired device runtimes. The goals are:
envmodules (community source of truth).This document is the testing counterpart to the compile-backend RFC (#253) and device/runtime RFC (#255); it does not redefine those mechanisms.
Motivation
FLYDSL_COMPILE_BACKEND,get_backend(), andregister_backend(). Tests today are largely implicitly ROCm/ROCDL and do not declare backend or tier.ixdl) that cannot be imported in a generic LLVM/FlyDSL CI image. Tests that require such dialects must be separable from tests that only need the Fly dialect and upstream/common dialects.Non-goals
ixdlbackend implementations.Test tiers
L0 — Backend-agnostic
rocdl,ixdl,nvvm, …).L1 — Compile-tier, no device execution
Tests that exercise compilation or MLIR passes but do not launch kernels on a GPU (no ExecutionEngine execution path that requires a live device for correctness).
L1 is split into L1a and L1b:
L1a — No vendor target dialect
arith,func,memrefwhere applicable).Rationale: environments that cannot
importor registerixdl(or similar) still run L0 + L1a in CI.L1b — Target dialect / backend-specific lowering
convert-fly-to-rocdl,#rocdl.target, or a futureixdlpipeline).COMPILE_ONLY=1, FileCheck, orPassManagerverification only).Rationale: ROCm-focused CI runs L1b-rocm; a partner lab runs L1b-ixdl when that build is available. Neither tier is required for every contributor machine.
L2 — Device-tier
Relationship to prior RFCs
get_backend(),FLYDSL_COMPILE_BACKENDFLYDSL_RUNTIME_*as inenv.runtimeEnvironment variables (community source of truth)
Test harnesses and documentation must use the same names as
python/flydsl/utils/env.py. Do not introduce parallel spellings in docs or CI.FLYDSL_COMPILE_BACKENDrocm; drivesget_backend().ARCHenv.compile.arch(notFLYDSL_COMPILE_ARCH).COMPILE_ONLYenv.compile.compile_only(notFLYDSL_COMPILE_ONLY).FLYDSL_RUNTIME_CACHE_DIRFLYDSL_RUNTIME_ENABLE_CACHE0/falseto disable (there is noFLYDSL_NO_CACHEinenv).FLYDSL_DUMP_IR,FLYDSL_DUMP_DIRFLYDSL_GPU_ARCH,HSA_OVERRIDE_GFX_VERSIONruntime/device.py/ ROCm backend detection; distinct fromARCH.Session-level pytest options (if implemented) should set these variables at session start, e.g.:
--flydsl-compile-backend→FLYDSL_COMPILE_BACKEND--flydsl-compile-arch→ARCHDefault behavior when unset remains community defaults (backend
rocm).Pytest markers
Register in
pytest.ini(names illustrative; finalize in implementation PR):Rules:
l1b_target_dialecttests should add a second marker when they assume a specific stack (rocm_lower, futureixdl_lower, …) so CI can-m "not ixdl_lower"on generic runners.l2_devicemay combine withrocm_lower(or equivalent) where relevant.Directory layout (evolutionary)
Phase 1 (recommended): keep existing
tests/kernels,tests/mlir,tests/unit, …; annotate files/classes with markers; document tier mapping intests/README.md.Phase 2 (optional): physical split, e.g.
tests/l0/,tests/l1_compile/,tests/l2_device/, or nesttests/mlir/rocm/Conversion/for L1b clarity.Physical moves are not required to adopt this RFC’s semantics.
CI matrix (sketch)
l0_backend_agnosticorl1a_compile_no_target_dialectl1b_target_dialect and rocm_lowerl1b_target_dialect and ixdl_lower(example)ixdll2_device(and backend markers)Migration plan
pytest.iniand a shorttests/README.mddescribing L0/L1a/L1b/L2 and env vars.kernels/→l2_device;mlir/Conversion/→l1b+rocm_lower; layout algebra MLIR →l1awhere applicable).docs/cute_layout_algebra_guide.md§9.2 per Appendix A (and align other user-facing tables withenv.py).tests/conftest.pymapping toFLYDSL_COMPILE_BACKENDandARCH.Open questions
tests/README.md.ARCH/COMPILE_ONLYto prefixed forms in a future env RFC (would require deprecation); until then, tests and docs follow current community names.Appendix A — Incorrect variable names in
docs/cute_layout_algebra_guide.md(§9.2)File:
docs/cute_layout_algebra_guide.mdSection: §9.2 Environment Variables (table around lines 411–418).
The following rows do not match
python/flydsl/utils/env.pyand should be corrected (or removed) when aligning documentation with the community implementation:env.py)FLYDSL_COMPILE_ONLY=1COMPILE_ONLY=1—compile_onlyusesenv_var="COMPILE_ONLY".FLYDSL_NO_CACHE=1env.FLYDSL_RUNTIME_ENABLE_CACHE=0(orfalse, perOptBoolparsing).Other rows in the same table (
ARCH,FLYDSL_DUMP_IR,FLYDSL_DUMP_DIR,FLYDSL_RUNTIME_CACHE_DIR) are consistent withenv.py. The table also omitsFLYDSL_COMPILE_BACKEND(from RFC #253 /env.compile.backend); consider adding it for completeness.Operating System
No response
GPU
No response
ROCm Component
No response