Skip to content

feat(riscv): i64 div/rem (Phase 3) — inline software long division#131

Merged
avrabe merged 1 commit into
mainfrom
feat/riscv-i64-divrem
May 23, 2026
Merged

feat(riscv): i64 div/rem (Phase 3) — inline software long division#131
avrabe merged 1 commit into
mainfrom
feat/riscv-i64-divrem

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 22, 2026

Summary

Implements I64DivS, I64DivU, I64RemS, I64RemU in the RV32IMAC selector — the four ops deferred from i64 Phase 2 (#128). RV32 i64 integer arithmetic is now complete.

Approach: inline software long division

RV32IMAC's M extension only has 32-bit div/divu/rem/remu — no 64-bit divide. Rather than invent a __divdi3 runtime-library contract (synth produces self-contained bare-metal ELF with no runtime), this inlines a restoring binary long division.

emit_i64_udiv_inline is the shared unsigned core — a 64-iteration shift-subtract loop with the 64-bit <<1, unsigned compare, and subtract open-coded over the lo/hi pair. All four ops route through it:

  • div_u / rem_u — operands straight to the core.
  • div_s / rem_s — derive each operand's sign via sra hi, 31, reduce to magnitudes with branchless (x^mask)-mask, run the core, fix the result sign (quotient sign = sign(dividend) ^ sign(divisor); remainder sign = sign(dividend), per wasm truncated division).

Trap semantics

  • Divide by zero — ORs the full 64-bit divisor (or zo, dl, dh; bne zo, zero, ok; ebreak; ok:), traps iff both halves are zero. Matches the i32 div trap style.
  • INT64_MIN / -1 overflow — guarded for div_s only, gated on options.signed_div_overflow_trap like the i32 path. rem_s deliberately omits it — INT64_MIN % -1 == 0 must not trap; i64_rem_s_does_not_emit_overflow_trap pins this.

Note for review — emit_parallel_move

The long-division loop holds 7+ values live across the loop body, but the selector's alloc_temp is liveness-unaware round-robin. The core therefore claims a fixed register file (t0-t6, s1-s3) and copies inputs in via a new alias-safe emit_parallel_move helper (cycle-breaking through s7, outside the temp pool). This is the one structural addition beyond plain codegen — worth a look.

Tests

+11 tests (148 → 158): one shape test per op (sequence shape + zero-divisor trap presence), the signed-overflow trap path, the rem_s no-trap distinction, the 64-iteration loop counter, i64-typed result plumbing, and 2 emit_parallel_move unit tests. The old i64_div_rem_are_unsupported_phase3 test is replaced by i64_div_rem_no_longer_unsupported.

Validation

  • cargo test --package synth-backend-riscv — 158 pass, 0 fail, 1 ignored.
  • cargo clippy --package synth-backend-riscv --all-targets -- -D warnings — clean.
  • cargo fmt --check — clean.

Cost / follow-ups

  • ~30+ instructions per div/rem op (signed wrappers add ~20 for sign handling). Acceptable for AOT embedded; a future optimization could share one routine across call sites.
  • Still out of scope (unchanged): sub-word sign-extending i64 loads (i64.load8_s…), f32/f64.

🤖 Generated with Claude Code

Implements I64DivS / I64DivU / I64RemS / I64RemU in the RV32 instruction
selector — the four ops deferred from Phase 2. RV32IMAC's M extension
only has 32-bit div/rem, so a 64-bit divide is lowered to an inline
restoring binary long-division loop (64 iterations) rather than a call
to a runtime helper, keeping synth's output self-contained.

- emit_i64_udiv_inline: the unsigned long-division core (shared by all
  four ops); quotient holds the dividend and shifts out as the remainder
  shifts in, with an open-coded 64-bit unsigned compare and subtract.
- lower_i64_div: single entry point. div_u/rem_u feed the core directly;
  div_s/rem_s reduce operands to magnitudes via sign masks, divide, then
  fix the result sign (quotient = sign(n)^sign(d), remainder = sign(n)).
- Zero-divisor trap checks the full 64-bit divisor (lo | hi == 0).
- Signed INT64_MIN/-1 overflow trap is emitted for div_s only; rem_s of
  the same operands yields 0 and correctly does not trap.
- emit_parallel_move: alias-safe copy-in of the long-division inputs to
  the core's fixed register file (the temp pool has no liveness tracking,
  unsafe across the loop body).

11 new tests; 148 -> 158 passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 167b7b1 into main May 23, 2026
10 of 13 checks passed
@avrabe avrabe deleted the feat/riscv-i64-divrem branch May 23, 2026 12:24
avrabe added a commit that referenced this pull request May 23, 2026
v0.5.0 — verification & robustness:
- #133 validator pattern to full i32 + i64 surface (#76)
- #131 RV32 i64 div/rem (Phase 3 completes i64 integer)
- #132 panic-free ir_to_arm + macro fix + gating fuzz restored

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant