-
Notifications
You must be signed in to change notification settings - Fork 0
Atomic memory operations #43
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add atomic memory operation words for safe concurrent updates from multiple threads.
Words to implement
Integer atomics
| Word | Stack effect | MLIR op | Description |
|---|---|---|---|
ATOMIC+ |
( n addr -- ) |
memref.atomic_rmw addi |
Atomic add (i64) |
ATOMIC-MAX |
( n addr -- ) |
memref.atomic_rmw maxs |
Atomic signed max (i64) |
ATOMIC-MIN |
( n addr -- ) |
memref.atomic_rmw mins |
Atomic signed min (i64) |
ATOMIC-AND |
( n addr -- ) |
memref.atomic_rmw andi |
Atomic bitwise AND |
ATOMIC-OR |
( n addr -- ) |
memref.atomic_rmw ori |
Atomic bitwise OR |
ATOMIC-XOR |
( n addr -- ) |
memref.atomic_rmw xori |
Atomic bitwise XOR |
ATOMIC-XCHG |
( n addr -- old ) |
memref.atomic_rmw assign |
Atomic exchange, returns old value |
ATOMIC-CAS |
( expected new addr -- old ) |
memref.generic_atomic_rmw |
Compare-and-swap, returns old value |
Float atomics
| Word | Stack effect | MLIR op | Description |
|---|---|---|---|
ATOMIC-F+ |
( f addr -- ) |
memref.atomic_rmw addf |
Atomic float add |
ATOMIC-FMAX |
( f addr -- ) |
memref.atomic_rmw maximumf |
Atomic float max |
ATOMIC-FMIN |
( f addr -- ) |
memref.atomic_rmw minimumf |
Atomic float min |
Motivation
- Multi-block reductions: When a reduction spans more than one thread block, the output must be accumulated atomically (e.g.,
ATOMIC-F+for partial sums,ATOMIC-FMAXfor global max). - Histogram / scatter patterns: Common GPU patterns where multiple threads update the same output location.
- Lock-free data structures:
ATOMIC-CASenables lock-free algorithms. - Flash attention: Multi-block flash attention variants need atomic output accumulation.
Implementation notes
- Integer atomics: straightforward mapping to
memref.atomic_rmwwith the appropriatearith::AtomicRMWKind. - Float atomics: values are i64 bit patterns on the stack, so bitcast to f64 before the atomic op. The address computation follows the same pattern as
!/F!. ATOMIC-CASis more complex: needsmemref.generic_atomic_rmwwith a comparison body, or lower directly to an LLVMcmpxchg.- NVVM has native support for all of these via PTX
atom.*instructions. - Consider starting with just
ATOMIC+andATOMIC-F+as the minimum viable set.
Files to modify
include/warpforth/Dialect/Forth/ForthOps.td— Define new opslib/Translation/ForthToMLIR/ForthToMLIR.cpp— Parse wordslib/Conversion/ForthToMemRef/ForthToMemRef.cpp— Add conversion patternstest/Translation/Forth/— Parser teststest/Conversion/ForthToMemRef/— Conversion tests
Priority
Medium — needed for multi-block reductions and scatter patterns. Not required for single-block kernels.
Related
- Float math intrinsics: FEXP, FSQRT, FLOG, FABS, FNEG, FMAX, FMIN #42 — Float math intrinsics (FMAX/FMIN needed alongside ATOMIC-FMAX/ATOMIC-FMIN)
- Warp-level primitives: shuffle and reductions #10 — Warp-level primitives (warp reductions reduce the need for atomics)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request