[Docs] Onboarding notebooks (1/n): expr foundation by jhinpan · Pull Request #584 · ROCm/FlyDSL

jhinpan · 2026-05-28T07:42:15Z

Summary

First PR (1/n) toward the onboarding notebook series requested in #573. Rather than jumping straight to vector-add + Layout 101, this set builds the flydsl.expr foundation bottom-up and stops before layout algebra (deferred to a follow-up series), so the later layout material rests on solid primitives.

Four notebooks in examples/notebooks/:

#	Notebook	Topic
00	`00_hello_flydsl`	the `@flyc.kernel` / `@flyc.jit` trace model; reading dumped IR (`FLYDSL_DUMP_IR`)
01	`01_numeric_types`	scalar type system (ints, floats, `bf16`/`fp8`), casts, promotion, `Constexpr` vs runtime
02	`02_struct`	`@fx.struct` aggregate value types and their C-style memory layout
03	`03_universal_ops`	target-agnostic `Universal*` atoms + a fully-universal vector-add capstone (validated vs torch)

Emphasis throughout is arch-neutrality: the capstone moves data with UniversalCopy32b (no rocdl/CDNA-specific atoms), and the IR peek shows the !fly.universal_copy<32> op before it specializes in convert_fly_to_rocdl.

Notes

Run-verified end-to-end on an MI350X (gfx950); committed with outputs cleared for clean diffs (re-run to populate).
Notebooks need wurlitzer (pip install jupyter wurlitzer) to show GPU printf inline — Jupyter doesn't capture device stdout on its own. See examples/notebooks/README.md.
Deferred to follow-ups to keep this PR small/reviewable: nbsphinx docs rendering, an nbmake execute-check CI job, and the layout/MMA notebook series.

Test plan

All four notebooks execute top-to-bottom with no errors on gfx950
Capstone matches torch (torch.allclose)
Reviewer: confirm location (examples/notebooks/) and whether to wire nbmake CI now or in the follow-up

Refs #573.

🤖 Generated with Claude Code

@fx

Interactive, bottom-up onboarding notebooks for the flydsl.expr foundation, bridging a newcomer to the existing examples/. This first set (1/n) covers: - 00_hello_flydsl - the @flyc.kernel / @flyc.jit trace model; reading dumped IR - 01_numeric_types - the scalar type system (ints, floats, bf16/fp8), casts, promotion, and Constexpr vs runtime values - 02_struct - @fx.struct aggregate value types and their memory layout - 03_universal_ops - the target-agnostic Universal* atoms, with a fully-universal vector-add capstone validated against torch Layout algebra (make_layout / logical_divide / tiled copy / MMA) is intentionally deferred to a follow-up series. All cells were run-verified on an MI350X (gfx950) and committed with outputs cleared. A short README indexes the series. Refs ROCm#573. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rity) Sharpen the series for agent consumers (fast ramp, fewer source lookups): - README: add a flydsl.expr API cheat-sheet (kernel/jit/launch, scalars, structs, copy atoms + register tensors) and the three printf/wurlitzer/Constexpr gotchas, so the whole foundation is reachable in one place. - 03_universal_ops: - explain the host->device tensor handoff (raw torch tensor vs from_dlpack + mark_layout_dynamic) and annotate the jit C param as fx.Tensor for consistency; - bridge nb01's '+' operator to nb03's register-tensor memref_load_vec / arith.addf / memref_store_vec compute; - stop calling UniversalFMA 'the MMA atom' (it has no real usage; MMA lowers via rocdl.MFMA) in the atom-family list and the Recap; - fix the pass name convert_fly_to_rocdl -> convert-fly-to-rocdl. Re-ran all four notebooks on a current gfx950 build: 00/01/02/03 clean, vadd matches torch, IR shows universal_copy<32>. Outputs committed cleared. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jhinpan · 2026-05-31T04:30:22Z

@coderfeli ready for review when you have a moment. PR 1/n of the expr-foundation onboarding series (#573): four notebooks (hello / numeric types / struct / universal ops + a vadd capstone), all re-run clean on gfx950 with outputs committed cleared.

Since the series increasingly doubles as agent-onboarding material, this last push adds a one-stop flydsl.expr API cheat-sheet to the README and tightens nb03 — the host→device tensor handoff (from_dlpack/mark_layout_dynamic), the +→memref_load_vec/arith.addf register-tensor bridge, dropping the UniversalFMA-as-MMA framing in favor of rocdl.MFMA, and the convert-fly-to-rocdl pass name. Happy to adjust.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a set of onboarding Jupyter notebooks and a companion README to introduce the flydsl.expr foundations and how to inspect generated IR.

Changes:

Add examples/notebooks/README.md index + API cheat-sheet + run instructions.
Add four onboarding notebooks covering kernel/jit basics, numeric types, @fx.struct, and Universal* atoms with a vector-add capstone.
Include IR-dumping walkthroughs via FLYDSL_DUMP_IR / FLYDSL_DUMP_DIR.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
examples/notebooks/README.md	Adds onboarding index, cheat-sheet, and execution instructions for the notebook series
examples/notebooks/00_hello_flydsl.ipynb	Introduces `@flyc.kernel` / `@flyc.jit`, GPU `printf` capture, and IR dumping
examples/notebooks/01_numeric_types.ipynb	Documents scalar type families, arithmetic, casts/promotions, and `Constexpr`
examples/notebooks/02_struct.ipynb	Introduces `@fx.struct`, layout queries, `.replace()`, and shared-memory struct use
examples/notebooks/03_universal_ops.ipynb	Explains universal atoms and builds a universal vector-add example + IR inspection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+| # | Notebook | Topic |
+|---|----------|-------|
+| 00 | [`00_hello_flydsl.ipynb`](00_hello_flydsl.ipynb) | the `@flyc.kernel` / `@flyc.jit` model; reading dumped IR |
+| 01 | [`01_numeric_types.ipynb`](01_numeric_types.ipynb) | scalar types: ints, floats, `bf16`/`fp8`, casts, `Constexpr` |
+| 02 | [`02_struct.ipynb`](02_struct.ipynb) | `@fx.struct` aggregate value types and their memory layout |
+| 03 | [`03_universal_ops.ipynb`](03_universal_ops.ipynb) | target-agnostic `Universal*` atoms + a vector-add capstone |


+# Kernel + launch (00)
+@flyc.kernel                       # device kernel; the body is traced to MLIR
+@flyc.jit                          # host launch wrapper
+kernel(args).launch(grid=(gx, 1, 1), block=[bx, 1, 1], stream=stream)


+    "@flyc.jit\n",
+    "def vadd(A: fx.Tensor, B: fx.Tensor, C: fx.Tensor, n: fx.Int32, stream: fx.Stream = fx.Stream(None)):\n",
+    "    block_dim = 64\n",
+    "    grid_x = (n + block_dim - 1) // block_dim\n",
+    "    vadd_kernel(A, B, C, block_dim).launch(grid=(grid_x, 1, 1), block=[block_dim, 1, 1], stream=stream)\n",


+    "dump_dir = tempfile.mkdtemp(prefix=\"flydsl_ir_\")\n",
+    "os.environ[\"FLYDSL_DUMP_IR\"] = \"1\"\n",
+    "os.environ[\"FLYDSL_DUMP_DIR\"] = dump_dir\n",


+    "    add_one(fx.Int32(41), stream=torch.cuda.Stream())\n",
+    "    torch.cuda.synchronize()\n",
+    "\n",
+    "os.environ.pop(\"FLYDSL_DUMP_IR\", None)  # stop dumping for the rest of the notebook\n",


+    "origin = sorted(glob.glob(os.path.join(dump_dir, \"*\", \"00_origin.mlir\")))[0]\n",
+    "atom_lines = [ln.strip() for ln in open(origin).read().splitlines() if \"copy_atom\" in ln]\n",


+    "execution": {
+     "iopub.execute_input": "2026-05-28T07:39:35.911107Z",
+     "iopub.status.busy": "2026-05-28T07:39:35.911005Z",
+     "iopub.status.idle": "2026-05-28T07:39:36.638621Z",
+     "shell.execute_reply": "2026-05-28T07:39:36.637986Z"
+    },


…c metadata) Acted on the substantive Copilot comments: - Close the IR-dump file reads with `with open(...)` in nb00 and nb03 (C6). - Pop both FLYDSL_DUMP_IR and FLYDSL_DUMP_DIR after the dump cell so the env we set doesn't linger for later cells (C4/C5) -- without the suggested try/finally, which would be defensive noise for a teaching cell. - Strip per-cell `metadata.execution` timestamps from all four notebooks so re-running doesn't churn the diff (C7); matches the outputs-cleared convention. Did not act on the false positives: the README table uses single-pipe rows (not '||'); `block=[...]` and a dynamic `n: fx.Int32` launcher arg both match the canonical examples/01-vectorAdd.py and run clean on gfx950. Re-ran all four on a current build: 00/01/02/03 clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jhinpan · 2026-05-31T15:03:15Z

Thanks @copilot — went through all 7. Pushed c8754c70 with the substantive ones:

Close file handles (nb00, nb03 IR reads) → with open(...). ✅
Dump-dir env leak → pop both FLYDSL_DUMP_IR and FLYDSL_DUMP_DIR. ✅ (Skipped the try/finally — it's defensive noise for a teaching cell, and the leak was benign anyway since dumping is gated on DUMP_IR.)
Execution-timestamp metadata → stripped from all four notebooks so re-runs don't churn the diff. ✅ (Matches the outputs-cleared convention; a repo-wide nbstripout pre-commit hook would be the durable fix — happy to do that as a separate infra PR.)

The other three I'm leaving as-is — they don't reproduce:

README || table: the rows are single-pipe (| 00 | … |); renders fine.
block=[bx, 1, 1] / dynamic n: fx.Int32: both mirror examples/01-vectorAdd.py (grid=(…), block=[…], n: fx.Int32 then grid_x = (n + block_dim - 1) // block_dim) — .launch takes a list, and the dynamic-i32 launcher arg is the canonical idiom. All four notebooks re-run clean on gfx950.

jhinpan and others added 3 commits May 28, 2026 07:41

Merge branch 'main' into docs/notebooks-573-expr-foundation

35eb36f

jhinpan marked this pull request as ready for review May 31, 2026 04:29

Copilot AI review requested due to automatic review settings May 31, 2026 04:29

Copilot AI reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Onboarding notebooks (1/n): expr foundation#584

[Docs] Onboarding notebooks (1/n): expr foundation#584
jhinpan wants to merge 4 commits into
ROCm:mainfrom
jhinpan:docs/notebooks-573-expr-foundation

jhinpan commented May 28, 2026

Uh oh!

jhinpan commented May 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jhinpan commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"origin = sorted(glob.glob(os.path.join(dump_dir, \"*\", \"00_origin.mlir\")))[0]\n",
		"atom_lines = [ln.strip() for ln in open(origin).read().splitlines() if \"copy_atom\" in ln]\n",

Conversation

jhinpan commented May 28, 2026

Summary

Notes

Test plan

Uh oh!

jhinpan commented May 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jhinpan commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants