Skip to content

[Feature]: RFC: Test tiering and multi-backend CI matrix #275

@Peter9606

Description

@Peter9606

Suggestion Description

RFC: Test tiering and multi-backend CI matrix

Status Draft
Author Peter Han
Created 2026-03-24
Related RFC: Pluggable GPU compile backend (Python layer), RFC : Device and runtime layer (aligned with pluggable compile backends)

Summary

This RFC defines a layered testing model for FlyDSL as the project supports multiple compile backends and (per RFC : Device and runtime layer (aligned with pluggable compile backends)) paired device runtimes. The goals are:

  1. Clear dependencies — which tests run without MLIR, which need target dialects but no GPU, which need real hardware.
  2. Downstream-friendly CI — contributors without vendor-specific MLIR builds (e.g. dialects not yet in upstream LLVM) can still run a large, meaningful subset of tests.
  3. Single configuration surface — tests select behavior via environment variables and pytest options that match existing FlyDSL env modules (community source of truth).

This document is the testing counterpart to the compile-backend RFC (#253) and device/runtime RFC (#255); it does not redefine those mechanisms.

Motivation

  • RFC: Pluggable GPU compile backend (Python layer) introduced FLYDSL_COMPILE_BACKEND, get_backend(), and register_backend(). Tests today are largely implicitly ROCm/ROCDL and do not declare backend or tier.
  • Vendor backends (e.g. Iluvatar) may ship MLIR dialects (e.g. ixdl) that cannot be imported in a generic LLVM/FlyDSL CI image. Tests that require such dialects must be separable from tests that only need the Fly dialect and upstream/common dialects.
  • Hardware labs are expensive; compile-only and IR/FileCheck tests should run on every PR without a GPU.

Non-goals

  • Specifying the full pytest directory layout migration in one step (markers and docs can land first).
  • Defining concrete CUDA or ixdl backend implementations.

Test tiers

L0 — Backend-agnostic

  • No assumption about compile backend id or device runtime kind.
  • No requirement for vendor target dialects (rocdl, ixdl, nvvm, …).
  • Typical content: pure Python helpers, AST/parser tests, registry logic that does not execute target-specific pipelines.

L1 — Compile-tier, no device execution

Tests that exercise compilation or MLIR passes but do not launch kernels on a GPU (no ExecutionEngine execution path that requires a live device for correctness).

L1 is split into L1a and L1b:

L1a — No vendor target dialect

  • Must not depend on registering or importing vendor-specific target dialects not guaranteed in a minimal FlyDSL + upstream LLVM build.
  • Pass pipelines and IR stay in Fly and portable dialects (exact allow-list is project-defined; e.g. arith, func, memref where applicable).

Rationale: environments that cannot import or register ixdl (or similar) still run L0 + L1a in CI.

L1b — Target dialect / backend-specific lowering

  • May run passes or hold IR that uses target-specific representations (e.g. convert-fly-to-rocdl, #rocdl.target, or a future ixdl pipeline).
  • Still no GPU execution for correctness (may use COMPILE_ONLY=1, FileCheck, or PassManager verification only).

Rationale: ROCm-focused CI runs L1b-rocm; a partner lab runs L1b-ixdl when that build is available. Neither tier is required for every contributor machine.

L2 — Device-tier

  • Requires GPU + driver + runtime stack (and often PyTorch) for launch, numerical checks, or performance harnesses.
  • Should be tagged with the runtime or hardware family when useful (e.g. ROCm HIP).

Relationship to prior RFCs

Concern Owner
Which compile backend is active RFC: Pluggable GPU compile backend (Python layer)get_backend(), FLYDSL_COMPILE_BACKEND
Compile/runtime pairing RFC : Device and runtime layer (aligned with pluggable compile backends) — when implemented, FLYDSL_RUNTIME_* as in env.runtime
Which tests run in a given job This RFC — markers + env + optional pytest CLI

Environment variables (community source of truth)

Test harnesses and documentation must use the same names as python/flydsl/utils/env.py. Do not introduce parallel spellings in docs or CI.

Purpose Variable Notes
Compile backend id FLYDSL_COMPILE_BACKEND Default rocm; drives get_backend().
Override target arch for compile ARCH Maps to env.compile.arch (not FLYDSL_COMPILE_ARCH).
Compile without execution COMPILE_ONLY Maps to env.compile.compile_only (not FLYDSL_COMPILE_ONLY).
JIT cache directory FLYDSL_RUNTIME_CACHE_DIR
Enable/disable JIT cache FLYDSL_RUNTIME_ENABLE_CACHE Use 0/false to disable (there is no FLYDSL_NO_CACHE in env).
IR dump FLYDSL_DUMP_IR, FLYDSL_DUMP_DIR Debug options.
ROCm arch hints (detection) FLYDSL_GPU_ARCH, HSA_OVERRIDE_GFX_VERSION Used by runtime/device.py / ROCm backend detection; distinct from ARCH.

Note: Issue #253 body historically mentioned FLYDSL_COMPILE_ARCH; the implemented community env uses ARCH for compile arch override. This RFC follows code as source of truth.

Session-level pytest options (if implemented) should set these variables at session start, e.g.:

  • --flydsl-compile-backendFLYDSL_COMPILE_BACKEND
  • --flydsl-compile-archARCH

Default behavior when unset remains community defaults (backend rocm).

Pytest markers

Register in pytest.ini (names illustrative; finalize in implementation PR):

markers =
  l0_backend_agnostic: no compile/runtime backend assumption
  l1a_compile_no_target_dialect: MLIR compile path without vendor target dialects
  l1b_target_dialect: requires a target lowering stack (further narrow with dialect-specific markers)
  l2_device: requires GPU / full runtime
  rocm_lower: L1b/L2 tests that assume ROCDL lowering path
  # Future: ixdl_lower, cuda_lower, …

Rules:

  • l1b_target_dialect tests should add a second marker when they assume a specific stack (rocm_lower, future ixdl_lower, …) so CI can -m "not ixdl_lower" on generic runners.
  • l2_device may combine with rocm_lower (or equivalent) where relevant.

Directory layout (evolutionary)

Phase 1 (recommended): keep existing tests/kernels, tests/mlir, tests/unit, …; annotate files/classes with markers; document tier mapping in tests/README.md.

Phase 2 (optional): physical split, e.g. tests/l0/, tests/l1_compile/, tests/l2_device/, or nest tests/mlir/rocm/Conversion/ for L1b clarity.

Physical moves are not required to adopt this RFC’s semantics.

CI matrix (sketch)

Job Typical selection Hardware
Minimal l0_backend_agnostic or l1a_compile_no_target_dialect CPU only
Standard compile above + l1b_target_dialect and rocm_lower CPU; MLIR build with ROCDL
Partner / vendor l1b_target_dialect and ixdl_lower (example) CPU; custom MLIR with ixdl
GPU l2_device (and backend markers) ROCm (or other) runner

Migration plan

  1. Add markers to pytest.ini and a short tests/README.md describing L0/L1a/L1b/L2 and env vars.
  2. Label high-traffic directories first (kernels/l2_device; mlir/Conversion/l1b + rocm_lower; layout algebra MLIR → l1a where applicable).
  3. Fix docs/cute_layout_algebra_guide.md §9.2 per Appendix A (and align other user-facing tables with env.py).
  4. Introduce optional pytest CLI in tests/conftest.py mapping to FLYDSL_COMPILE_BACKEND and ARCH.
  5. Optionally restructure directories in follow-up PRs.

Open questions

  1. Exact allow-list of dialects/passes for L1a (maintainer-maintained table vs “anything not in a deny-list”).
  2. Whether LIT tests get pytest markers via a small wrapper or remain CMake/lit-only with a parallel tier table in tests/README.md.
  3. Whether to rename ARCH / COMPILE_ONLY to prefixed forms in a future env RFC (would require deprecation); until then, tests and docs follow current community names.

Appendix A — Incorrect variable names in docs/cute_layout_algebra_guide.md (§9.2)

File: docs/cute_layout_algebra_guide.md
Section: §9.2 Environment Variables (table around lines 411–418).

The following rows do not match python/flydsl/utils/env.py and should be corrected (or removed) when aligning documentation with the community implementation:

Line (approx.) Document currently says Problem Correct / canonical (env.py)
416 FLYDSL_COMPILE_ONLY=1 Variable name does not exist. COMPILE_ONLY=1compile_only uses env_var="COMPILE_ONLY".
417 FLYDSL_NO_CACHE=1 No such variable in env. Disable JIT cache with FLYDSL_RUNTIME_ENABLE_CACHE=0 (or false, per OptBool parsing).

Other rows in the same table (ARCH, FLYDSL_DUMP_IR, FLYDSL_DUMP_DIR, FLYDSL_RUNTIME_CACHE_DIR) are consistent with env.py. The table also omits FLYDSL_COMPILE_BACKEND (from RFC #253 / env.compile.backend); consider adding it for completeness.

Operating System

No response

GPU

No response

ROCm Component

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions