[Docs] Onboarding notebooks (1/n): expr foundation#584
Conversation
Interactive, bottom-up onboarding notebooks for the flydsl.expr foundation,
bridging a newcomer to the existing examples/. This first set (1/n) covers:
- 00_hello_flydsl - the @flyc.kernel / @flyc.jit trace model; reading dumped IR
- 01_numeric_types - the scalar type system (ints, floats, bf16/fp8), casts,
promotion, and Constexpr vs runtime values
- 02_struct - @fx.struct aggregate value types and their memory layout
- 03_universal_ops - the target-agnostic Universal* atoms, with a fully-universal
vector-add capstone validated against torch
Layout algebra (make_layout / logical_divide / tiled copy / MMA) is intentionally
deferred to a follow-up series. All cells were run-verified on an MI350X (gfx950)
and committed with outputs cleared. A short README indexes the series.
Refs ROCm#573.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rity)
Sharpen the series for agent consumers (fast ramp, fewer source lookups):
- README: add a flydsl.expr API cheat-sheet (kernel/jit/launch, scalars, structs,
copy atoms + register tensors) and the three printf/wurlitzer/Constexpr gotchas,
so the whole foundation is reachable in one place.
- 03_universal_ops:
- explain the host->device tensor handoff (raw torch tensor vs from_dlpack +
mark_layout_dynamic) and annotate the jit C param as fx.Tensor for consistency;
- bridge nb01's '+' operator to nb03's register-tensor
memref_load_vec / arith.addf / memref_store_vec compute;
- stop calling UniversalFMA 'the MMA atom' (it has no real usage; MMA lowers via
rocdl.MFMA) in the atom-family list and the Recap;
- fix the pass name convert_fly_to_rocdl -> convert-fly-to-rocdl.
Re-ran all four notebooks on a current gfx950 build: 00/01/02/03 clean, vadd matches
torch, IR shows universal_copy<32>. Outputs committed cleared.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@coderfeli ready for review when you have a moment. PR 1/n of the expr-foundation onboarding series (#573): four notebooks (hello / numeric types / struct / universal ops + a vadd capstone), all re-run clean on gfx950 with outputs committed cleared. Since the series increasingly doubles as agent-onboarding material, this last push adds a one-stop |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a set of onboarding Jupyter notebooks and a companion README to introduce the flydsl.expr foundations and how to inspect generated IR.
Changes:
- Add
examples/notebooks/README.mdindex + API cheat-sheet + run instructions. - Add four onboarding notebooks covering kernel/jit basics, numeric types,
@fx.struct, andUniversal*atoms with a vector-add capstone. - Include IR-dumping walkthroughs via
FLYDSL_DUMP_IR/FLYDSL_DUMP_DIR.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/notebooks/README.md | Adds onboarding index, cheat-sheet, and execution instructions for the notebook series |
| examples/notebooks/00_hello_flydsl.ipynb | Introduces @flyc.kernel / @flyc.jit, GPU printf capture, and IR dumping |
| examples/notebooks/01_numeric_types.ipynb | Documents scalar type families, arithmetic, casts/promotions, and Constexpr |
| examples/notebooks/02_struct.ipynb | Introduces @fx.struct, layout queries, .replace(), and shared-memory struct use |
| examples/notebooks/03_universal_ops.ipynb | Explains universal atoms and builds a universal vector-add example + IR inspection |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | # | Notebook | Topic | | ||
| |---|----------|-------| | ||
| | 00 | [`00_hello_flydsl.ipynb`](00_hello_flydsl.ipynb) | the `@flyc.kernel` / `@flyc.jit` model; reading dumped IR | | ||
| | 01 | [`01_numeric_types.ipynb`](01_numeric_types.ipynb) | scalar types: ints, floats, `bf16`/`fp8`, casts, `Constexpr` | | ||
| | 02 | [`02_struct.ipynb`](02_struct.ipynb) | `@fx.struct` aggregate value types and their memory layout | | ||
| | 03 | [`03_universal_ops.ipynb`](03_universal_ops.ipynb) | target-agnostic `Universal*` atoms + a vector-add capstone | |
| # Kernel + launch (00) | ||
| @flyc.kernel # device kernel; the body is traced to MLIR | ||
| @flyc.jit # host launch wrapper | ||
| kernel(args).launch(grid=(gx, 1, 1), block=[bx, 1, 1], stream=stream) |
| "@flyc.jit\n", | ||
| "def vadd(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, n: fx.Int32, stream: fx.Stream = fx.Stream(None)):\n", | ||
| " block_dim = 64\n", | ||
| " grid_x = (n + block_dim - 1) // block_dim\n", | ||
| " vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n", |
| "dump_dir = tempfile.mkdtemp(prefix=\"flydsl_ir_\")\n", | ||
| "os.environ[\"FLYDSL_DUMP_IR\"] = \"1\"\n", | ||
| "os.environ[\"FLYDSL_DUMP_DIR\"] = dump_dir\n", |
| " add_one(fx.Int32(41), stream=torch.cuda.Stream())\n", | ||
| " torch.cuda.synchronize()\n", | ||
| "\n", | ||
| "os.environ.pop(\"FLYDSL_DUMP_IR\", None) # stop dumping for the rest of the notebook\n", |
| "origin = sorted(glob.glob(os.path.join(dump_dir, \"*\", \"00_origin.mlir\")))[0]\n", | ||
| "atom_lines = [ln.strip() for ln in open(origin).read().splitlines() if \"copy_atom\" in ln]\n", |
| "execution": { | ||
| "iopub.execute_input": "2026-05-28T07:39:35.911107Z", | ||
| "iopub.status.busy": "2026-05-28T07:39:35.911005Z", | ||
| "iopub.status.idle": "2026-05-28T07:39:36.638621Z", | ||
| "shell.execute_reply": "2026-05-28T07:39:36.637986Z" | ||
| }, |
…c metadata) Acted on the substantive Copilot comments: - Close the IR-dump file reads with `with open(...)` in nb00 and nb03 (C6). - Pop both FLYDSL_DUMP_IR and FLYDSL_DUMP_DIR after the dump cell so the env we set doesn't linger for later cells (C4/C5) -- without the suggested try/finally, which would be defensive noise for a teaching cell. - Strip per-cell `metadata.execution` timestamps from all four notebooks so re-running doesn't churn the diff (C7); matches the outputs-cleared convention. Did not act on the false positives: the README table uses single-pipe rows (not '||'); `block=[...]` and a dynamic `n: fx.Int32` launcher arg both match the canonical examples/01-vectorAdd.py and run clean on gfx950. Re-ran all four on a current build: 00/01/02/03 clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @copilot — went through all 7. Pushed
The other three I'm leaving as-is — they don't reproduce:
|
Summary
First PR (1/n) toward the onboarding notebook series requested in #573. Rather than jumping straight to vector-add + Layout 101, this set builds the
flydsl.exprfoundation bottom-up and stops before layout algebra (deferred to a follow-up series), so the later layout material rests on solid primitives.Four notebooks in
examples/notebooks/:00_hello_flydsl@flyc.kernel/@flyc.jittrace model; reading dumped IR (FLYDSL_DUMP_IR)01_numeric_typesbf16/fp8), casts, promotion,Constexprvs runtime02_struct@fx.structaggregate value types and their C-style memory layout03_universal_opsUniversal*atoms + a fully-universal vector-add capstone (validated vs torch)Emphasis throughout is arch-neutrality: the capstone moves data with
UniversalCopy32b(norocdl/CDNA-specific atoms), and the IR peek shows the!fly.universal_copy<32>op before it specializes inconvert_fly_to_rocdl.Notes
wurlitzer(pip install jupyter wurlitzer) to show GPUprintfinline — Jupyter doesn't capture device stdout on its own. Seeexamples/notebooks/README.md.nbsphinxdocs rendering, annbmakeexecute-check CI job, and the layout/MMA notebook series.Test plan
torch(torch.allclose)examples/notebooks/) and whether to wirenbmakeCI now or in the follow-upRefs #573.
🤖 Generated with Claude Code