You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+75-2Lines changed: 75 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,15 @@
2
2
3
3
A GPU-accelerated superoptimizer for the Zilog Z80 processor. The compiler that **never guesses** — every optimization is provably optimal.
4
4
5
-
## What's New (Birthday Marathon — March 26–29, 2026)
5
+
## What's New (Birthday Marathon — March 26 – April 1, 2026)
6
+
7
+
### Day 7 Highlights (April 1, 2026)
8
+
9
+
-**`pkg/regalloc/ofb.go`** — public OFB API: `ComputeOFB()`, `LoadOFB()`, 15 flag constants, `OFBNames()`. No more local duplicates in consumer tools.
10
+
-**OFB sidecars for ALL table files** — `enrich-ofb` now auto-detects ENRT and Z80T v2 formats. Sidecars generated for `merged_ix_5v.bin`, `ix_expanded_5v.bin`, etc. (~233MB each).
11
+
-**`cmd/gen6v-ix-feed`** — fast FuncDesc JSON generator for the 6v IX-expanded GPU run. Pre-computes 562 valid treewidth≥4 masks (out of 32,768 possible 6v graphs) in <0.5s, then iterates masks as outer loop → 200K shapes/sec feeder vs 8 shapes/sec from regalloc-enum CPU bottleneck.
12
+
-**Dual-GPU ix_expanded_6v_dense run** — 298.7M shapes split across GPU0 (masks 0–280) + GPU1 (masks 281–561), running in background, ETA ~5h. Will yield the largest IX-aware regalloc table yet.
13
+
-**EXX zone architecture** — S1+S2 independent table lookups, IXH/IXL/IYH/IYL as zero-cost inter-zone bridges. Full pipeline: `total_cost = lookup(S1) + lookup(S2) + 4T×N_exx + 8T×N_ix_accesses`.
6
14
7
15
### Week 1 Highlights
8
16
@@ -24,6 +32,7 @@ A GPU-accelerated superoptimizer for the Zilog Z80 processor. The compiler that
These table files are currently being computed or are planned. Each will be a drop-in addition to the existing regalloc pipeline.
450
+
451
+
### `data/ix_expanded_6v_dense.bin` — **in progress** (ETA ~5h from April 1, 2026)
452
+
453
+
The largest regalloc table yet: 6 virtual registers with full IX/IY half-register support and treewidth≥4 interference graphs.
454
+
455
+
**What it enables:**
456
+
- IX-aware register allocation for dense 6-vreg functions — currently the `merged_ix_5v.bin` table (60.9M entries) only covers up to 5 vregs with IX halves
457
+
- Complete `pkg/regalloc` O(1) lookup for the common case of 6 live variables with pointer or EXX-zone patterns
458
+
- Covers `HLH'L' u32` patterns (HL in main bank + BC or DE free as shadow) that appear in 32-bit arithmetic loops
459
+
460
+
**Generation:**
461
+
```bash
462
+
# Currently running (background, dual-GPU on main i7):
The key algorithmic insight behind `gen6v-ix-feed`: of 32,768 possible nv=6 interference graphs, only **562 have treewidth≥4** (the ones worth exhaustive search). Pre-computing these 562 masks and iterating them as the outer loop reduces the feeder from 8 shapes/sec (CPU bottleneck) to 200K shapes/sec.
476
+
477
+
### OFB sidecars — complete for all current `.bin` tables
478
+
479
+
OFB (Op Feasibility Bag) sidecars precompute 15 per-assignment flags in O(1), aligned 1:1 with the source file:
OFB flags let the backend skip table lookups for common feasibility checks: `OFBMul8Safe` (H/L/C all free → safe to clobber for mul8), `OFBDJNZFree` (B free → DJNZ without save), `OFBHLArith` (HL assigned → ADD HL,rr native), etc.
0 commit comments