Skip to content

feat(riscv): RISC-V QEMU integration test target (8/8 ✅)#25

Open
gwpl wants to merge 7 commits intoraphaelmansuy:mainfrom
VariousForks:i19-target-riscv-qemu
Open

feat(riscv): RISC-V QEMU integration test target (8/8 ✅)#25
gwpl wants to merge 7 commits intoraphaelmansuy:mainfrom
VariousForks:i19-target-riscv-qemu

Conversation

@gwpl
Copy link
Copy Markdown

@gwpl gwpl commented Mar 25, 2026

Summary

Greg's AI coding buddy here 🖖 — cross-compiled edgeparse to riscv64gc, fired up QEMU user-mode, and watched PDF parsing happen on an architecture that didn't exist when most of us started coding. The future is now.

  • Cross-compiles edgeparse for riscv64gc-unknown-linux-gnu (3.7 MB dynamic ELF)
  • Runs under QEMU user-mode emulation (qemu-riscv64) — no real RISC-V hardware needed
  • 8/8 tests passing: full PDF extraction pipeline works on RISC-V

Part of #19 — edgeparse runs everywhere & sandboxes securely

Test plan

  • ./tests/wasm-runtimes/wasm-test.sh test riscv-qemu → 8/8 pass

🤖 Generated with Claude Code

gwpl and others added 5 commits March 25, 2026 08:45
Greg's AI coding buddy reporting for duty! o7

* Make rayon optional behind a `native` feature flag (default: enabled)
* Add `convert_file()` dispatcher: native builds use `edgeparse_core::convert()`
  (with raster table OCR), WASI builds use `convert_bytes()` (in-memory)
* Sequential file iteration when rayon is disabled (#[cfg(not(feature = "native"))])
* Build for WASI: `cargo build --target wasm32-wasip1 --no-default-features`

The resulting .wasm binary is 3.1 MB and runs on wasmtime, wasmer,
wasmedge, wamr — basically anything that speaks WASI Preview 1.
PDF parsing works identically; you just trade parallelism for portability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greg's AI coding buddy: because if it doesn't run in a container,
does it even exist? Here's the full portable test suite.

Build Dockerfiles (reproducible compilation):
* Dockerfile.build.wasm   — wasm32-wasip1 (3.1 MB binary)
* Dockerfile.build.wasix  — wasm32-wasmer-wasi (WASIX toolchain)
* Dockerfile.build.riscv  — riscv64gc cross-compile (dynamic + static)

WASM runtime runners (all tested, all green):
* Dockerfile.runner.wasmtime  — Bytecode Alliance reference (v43)
* Dockerfile.runner.wasmer    — WASIX superpowers (v7)
* Dockerfile.runner.wasmedge  — CNCF cloud-native (v0.14.1)
* Dockerfile.runner.wamr      — embedded champion (iwasm, built from source)
* Dockerfile.runner.wasix     — WASIX backward compat on Wasmer

RISC-V runners:
* Dockerfile.runner.riscv-qemu — QEMU user-mode (proven, all tests pass)
* Dockerfile.runner.spike      — official ISA reference simulator + pk
* Dockerfile.runner.libriscv   — fastest sandbox (~3ns calls)
* Dockerfile.runner.rvvm       — tracing JIT (experimental)
* Dockerfile.runner.ckb-vm     — blockchain VM (experimental)

Management:
* wasm-test.sh  — build|test|status|run|log|rmi|clean
* run-tests.sh  — 7 tests per runtime (help, version, JSON/MD/TXT/HTML, errors)
* README.md     — architecture, quick start, extending guide
* All Docker images prefixed with EDGEPARSE_PREFIX (default: "edgeparse")

Verified: 6 runtimes × 8 assertions = 48/48 tests passing
(wasmtime, wasmer, wasmedge, wamr, wasix, riscv-qemu)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greg's AI coding buddy: because typing Dockerfile paths is for people
who don't have a Makefile.

* make wasi-build   — build all WASM runtime + RISC-V Docker test images
* make wasi-test    — run integration tests across all runtimes
* make wasi-status  — show Docker image/container status
* make wasi-clean   — remove all WASI test Docker images and artifacts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greg's AI coding buddy: debugging RISC-V VM sandboxes so you don't have to.

* Add ca-certificates to spike/libriscv/rvvm Dockerfiles (git clone over HTTPS)
* Fix libriscv arg passing: use '--' separator (rvlinux intercepts -f/-h flags)
* Fix RVVM binary discovery: handle rvvm_x86_64 naming in newer releases
* Fix libriscv build path: upstream moved rvlinux to emulator/ subdirectory

Known limitations:
* libriscv: glibc static TLS destructors crash on file I/O (needs musl target)
* CKB-VM: upstream ckb-debugger has compile errors (ckb_schemars incompatibility)
* RVVM: userland mode support varies by version (experimental)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greg's AI coding buddy: battling every RISC-V VM's quirks so you
know exactly what works and what's still a quest-in-progress.

Spike fixes:
* Add libc6-dev-riscv64-cross for pk build (stubs-lp64.h)
* Symlink stubs-lp64d.h → stubs-lp64.h (ABI mismatch workaround)
* Symlink pk to riscv64-unknown-elf path (where spike searches)
* Status: builds, but pk has limited Linux syscall proxying — Rust
  binaries with TLS/complex allocators get truncated output

libriscv fixes:
* Build from emulator/ subdirectory (upstream restructured)
* Add '--' arg separator (rvlinux intercepts -f/-h flags)
* Status: help/version work, file I/O crashes on glibc TLS destructors

RVVM: documented as system-only emulator (no userland mode)
CKB-VM: upstream ckb-debugger has compile errors (kept for reference)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Docker-based integration-test infrastructure to validate edgeparse across multiple WASM runtimes and RISC-V (QEMU user-mode), with a small CLI refactor to support non-native/WASI execution.

Changes:

  • Introduce tests/wasm-runtimes/wasm-test.sh manager + run-tests.sh container runner to build/test across runtimes.
  • Add Dockerfiles to build WASI/WASIX and RISC-V binaries and run them under Wasmtime/Wasmer/WasmEdge/WAMR and RISC-V QEMU (plus experimental RISC-V runners).
  • Make rayon optional behind a native feature and add convert_file() dispatcher for native vs non-native builds.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
tests/wasm-runtimes/wasm-test.sh Host-side manager to build images/artifacts and run the runtime matrix.
tests/wasm-runtimes/run-tests.sh Container-side integration test runner exercising the CLI and conversions.
tests/wasm-runtimes/README.md Documentation for architecture, usage, and extending the runtime matrix.
tests/wasm-runtimes/Dockerfile.runner.base Shared Ubuntu base layer for WASM runtime runners.
tests/wasm-runtimes/Dockerfile.runner.wasmtime Wasmtime runner image wiring the WASI artifact into tests.
tests/wasm-runtimes/Dockerfile.runner.wasmer Wasmer runner image wiring the WASI artifact into tests.
tests/wasm-runtimes/Dockerfile.runner.wasmedge WasmEdge runner image (pinned) wiring the WASI artifact into tests.
tests/wasm-runtimes/Dockerfile.runner.wamr WAMR/iwasm runner image built from source.
tests/wasm-runtimes/Dockerfile.runner.wasix WASIX-on-Wasmer runner image for compatibility testing.
tests/wasm-runtimes/Dockerfile.runner.riscv-qemu RISC-V QEMU user-mode runner for the cross-compiled ELF.
tests/wasm-runtimes/Dockerfile.runner.spike Spike+pk runner image (experimental).
tests/wasm-runtimes/Dockerfile.runner.libriscv libriscv/rvlinux runner image (experimental).
tests/wasm-runtimes/Dockerfile.runner.rvvm RVVM runner image (documented as incompatible/experimental).
tests/wasm-runtimes/Dockerfile.runner.ckb-vm CKB-VM runner image (experimental).
tests/wasm-runtimes/Dockerfile.build.wasm Docker build for a reproducible wasm32-wasip1 artifact.
tests/wasm-runtimes/Dockerfile.build.wasix Docker build for a WASIX (wasm32-wasmer-wasi) artifact.
tests/wasm-runtimes/Dockerfile.build.riscv Docker build for riscv64gc dynamic + static ELF artifacts.
tests/wasm-runtimes/.gitignore Ignore extracted build artifacts under .build/.
crates/edgeparse-cli/src/main.rs Gate parallelism behind native and add convert_file() dispatcher.
crates/edgeparse-cli/Cargo.toml Add native feature; make rayon optional; disable core default features.
Makefile Add wasi-build/test/status/clean convenience targets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +190 to +198
all)
build_wasm
build_wasix
build_riscv
build_base
for rt in ${ALL_RUNNERS}; do
build_runner "${rt}"
done
;;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_build all always builds the WASIX binary via build_wasix, but none of the runner images/tests reference ${BUILD_DIR}/edgeparse-wasix.wasm (the WASIX runner copies /test/edgeparse.wasm). This adds build time and complexity without affecting test coverage. Consider either wiring the WASIX runner to use the WASIX-built artifact, or removing build_wasix from the default all path until it’s actually used.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Assistant: All Copilot review comments on this PR are duplicates of those on PR #20 (same codebase). I've replied individually to each comment there with accept/pushback reasoning: #20

All accepted fixes are now committed (bc882e8) and pushed to this branch. Key changes: set -e fix, stable/experimental runtime split, test counter fix, help text update, stale artifact cleanup. See full commit.

Comment on lines +51 to +54
ALL_RISCV_RUNTIMES="riscv-qemu spike libriscv rvvm ckb-vm"
ALL_RUNNERS="${ALL_WASM_RUNTIMES} ${ALL_RISCV_RUNTIMES}"
ALL_IMAGES="${IMG_BUILD_WASM} ${IMG_BUILD_WASIX} ${IMG_BUILD_RISCV} ${IMG_BASE} ${IMG_WASMTIME} ${IMG_WASMER} ${IMG_WASMEDGE} ${IMG_WAMR} ${IMG_WASIX} ${IMG_RISCV_QEMU} ${IMG_SPIKE} ${IMG_LIBRISCV} ${IMG_RVVM} ${IMG_CKB_VM}"

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL_RUNNERS includes spike, libriscv, rvvm, and ckb-vm, but several of these are explicitly marked as WIP/incompatible in their Dockerfiles/README. As written, ./wasm-test.sh test all will attempt them and likely return non-zero, which conflicts with the “test all”/Quick Start guidance. Consider splitting “stable” vs “experimental” targets, or marking known-incompatible runtimes as skipped in run-tests.sh so test all remains reliable.

Suggested change
ALL_RISCV_RUNTIMES="riscv-qemu spike libriscv rvvm ckb-vm"
ALL_RUNNERS="${ALL_WASM_RUNTIMES} ${ALL_RISCV_RUNTIMES}"
ALL_IMAGES="${IMG_BUILD_WASM} ${IMG_BUILD_WASIX} ${IMG_BUILD_RISCV} ${IMG_BASE} ${IMG_WASMTIME} ${IMG_WASMER} ${IMG_WASMEDGE} ${IMG_WAMR} ${IMG_WASIX} ${IMG_RISCV_QEMU} ${IMG_SPIKE} ${IMG_LIBRISCV} ${IMG_RVVM} ${IMG_CKB_VM}"
# Stable RISC-V runtimes used for "all" operations (Quick Start / CI)
ALL_RISCV_RUNTIMES="riscv-qemu"
# Experimental / WIP RISC-V runtimes kept separate so they don't break "all"
EXPERIMENTAL_RISCV_RUNTIMES="spike libriscv rvvm ckb-vm"
ALL_RUNNERS="${ALL_WASM_RUNTIMES} ${ALL_RISCV_RUNTIMES}"
# Stable images corresponding to the stable runtimes above
ALL_IMAGES="${IMG_BUILD_WASM} ${IMG_BUILD_WASIX} ${IMG_BUILD_RISCV} ${IMG_BASE} ${IMG_WASMTIME} ${IMG_WASMER} ${IMG_WASMEDGE} ${IMG_WAMR} ${IMG_WASIX} ${IMG_RISCV_QEMU}"
# Experimental / WIP images (not included in ALL_IMAGES)
EXPERIMENTAL_IMAGES="${IMG_SPIKE} ${IMG_LIBRISCV} ${IMG_RVVM} ${IMG_CKB_VM}"

Copilot uses AI. Check for mistakes.
log "Cleaning build artifacts and images..."

# Stop and remove any running test containers
docker ps -q --filter "name=${PREFIX}-test-" 2>/dev/null | xargs -r docker rm -f 2>/dev/null || true
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xargs -r is GNU-specific; on macOS/BSD xargs doesn’t support -r, which would make clean fail on those hosts. If you want this script to be cross-platform, consider guarding the docker rm call by checking for non-empty container IDs instead of relying on xargs -r.

Suggested change
docker ps -q --filter "name=${PREFIX}-test-" 2>/dev/null | xargs -r docker rm -f 2>/dev/null || true
local containers
containers=$(docker ps -q --filter "name=${PREFIX}-test-" 2>/dev/null || true)
if [ -n "${containers}" ]; then
docker rm -f ${containers} 2>/dev/null || true
fi

Copilot uses AI. Check for mistakes.
# -t edgeparse-wasi-wamr .
# ─────────────────────────────────────────────────────────────────────────────

FROM edgeparse-wasi-base
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDGEPARSE_PREFIX in wasm-test.sh makes the base image name configurable, but this Dockerfile hard-codes FROM edgeparse-wasi-base. With a non-default prefix, docker build will fail because the base tag differs. Consider using a build arg for the base image name (set by the manager script).

Suggested change
FROM edgeparse-wasi-base
ARG BASE_IMAGE=edgeparse-wasi-base
FROM ${BASE_IMAGE}

Copilot uses AI. Check for mistakes.
Comment on lines +187 to +193
log_test "Convert sample.pdf → JSON"
run_cmd=$(build_run_cmd "${BINARY}" "-f json -o /test/output -q /test/fixtures/sample.pdf")
if eval "${run_cmd}" > /test/output/json_stdout.txt 2>&1; then
if [ -f "/test/output/sample.json" ]; then
json_size=$(wc -c < /test/output/sample.json)
if [ "${json_size}" -gt 10 ]; then
log_pass "JSON output: ${json_size} bytes"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test 3 doesn’t remove /test/output/sample.json before running the JSON conversion. If the conversion command exits 0 but fails to (re)write the file, a stale sample.json from a previous run could cause a false pass. Consider rm -f /test/output/sample.json before invoking edgeparse (as you already do for md/txt/html).

Copilot uses AI. Check for mistakes.
# -t edgeparse-wasi-wasmedge .
# ─────────────────────────────────────────────────────────────────────────────

FROM edgeparse-wasi-base
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDGEPARSE_PREFIX in wasm-test.sh implies the base image name is configurable, but this Dockerfile hard-codes FROM edgeparse-wasi-base. If users set a different prefix, this build will fail because the base image tag changes. Consider parameterizing the base image via a build arg and wiring it up in wasm-test.sh.

Suggested change
FROM edgeparse-wasi-base
ARG BASE_IMAGE=edgeparse-wasi-base
FROM ${BASE_IMAGE}

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +15
RUN curl https://get.wasmer.io -sSfL | sh
ENV PATH="/root/.wasmer/bin:${PATH}"

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runner installs Wasmer via a remote install script (curl … | sh) without pinning a version or verifying the downloaded artifact. That reduces reproducibility and increases supply-chain risk for CI. Consider pinning a specific Wasmer version (e.g., via release tarball + checksum) or installing from a trusted package source.

Suggested change
RUN curl https://get.wasmer.io -sSfL | sh
ENV PATH="/root/.wasmer/bin:${PATH}"
ARG WASMER_VERSION="4.2.4"
ARG WASMER_SHA256="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
RUN curl -sSfL "https://github.com/wasmerio/wasmer/releases/download/v${WASMER_VERSION}/wasmer-linux-amd64.tar.gz" -o /tmp/wasmer.tar.gz \
&& echo "${WASMER_SHA256} /tmp/wasmer.tar.gz" | sha256sum -c - \
&& tar -xzf /tmp/wasmer.tar.gz -C /usr/local --strip-components=1 \
&& rm /tmp/wasmer.tar.gz
ENV PATH="/usr/local/bin:${PATH}"

Copilot uses AI. Check for mistakes.
# -t edgeparse-wasi-wasix .
# ─────────────────────────────────────────────────────────────────────────────

FROM edgeparse-wasi-base
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDGEPARSE_PREFIX support in the manager script changes the base image tag, but this Dockerfile uses FROM edgeparse-wasi-base unconditionally. Overriding the prefix will break builds because Docker won’t find the base image under the hard-coded name. Consider parameterizing the base image name via an ARG and passing it from wasm-test.sh.

Suggested change
FROM edgeparse-wasi-base
ARG EDGEPARSE_BASE_IMAGE=edgeparse-wasi-base
FROM ${EDGEPARSE_BASE_IMAGE}

Copilot uses AI. Check for mistakes.
Comment on lines +312 to +323
cmd_rmi() {
local target="${1:-all}"
if [ "${target}" = "all" ]; then
log "Removing all edgeparse test images..."
for img in ${ALL_IMAGES}; do
docker rmi -f "${img}" 2>/dev/null && dim " removed ${img}" || true
done
else
local img
img=$(image_name_for "${target}")
docker rmi -f "${img}" 2>/dev/null && dim " removed ${img}" || warn "Image ${img} not found"
fi
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script advertises rmi [all|<image>], but the implementation only accepts the logical runtime target names understood by image_name_for(). Passing an actual image name (or build targets like wasix-bin) will error as “Unknown target”. Either update the help/usage to reflect this, or accept raw image names in cmd_rmi when they don't match a known runtime.

Copilot uses AI. Check for mistakes.
@raphaelmansuy
Copy link
Copy Markdown
Owner

@copilot open a new pull request to apply changes based on the comments in this thread

@gwpl
Copy link
Copy Markdown
Author

gwpl commented Mar 25, 2026

AI Assistant: 🖖

Copilot raised the same set of concerns across all 6 PRs (same codebase). I've posted a detailed analysis on PR #20 — accepting 9 items and explaining the reasoning behind 5 where our approach has good justifications:

👉 Full response on PR #20

TL;DR of accepted fixes (will push to all branches):

  • Fix set -e blocking failure aggregation
  • Fix test counter mismatch (8 passed / 7 total)
  • Add missing rm -f for stale JSON artifact
  • Split stable vs experimental runtimes (test all = stable only)
  • Update help text and -- separator docs
  • Remove unused WASIX build from default path

@raphaelmansuy — happy to push these fixes directly to all 6 PRs rather than opening new ones. What's your preference?

🤖 AI+👤🤝

gwpl and others added 2 commits March 25, 2026 15:52
AI Assistant: Applied accepted fixes from Copilot PR review.

* Fix `set -e` blocking failure aggregation: wrap `docker run` in
  if/else to capture exit code without aborting the script
* Fix test counter: track assertions vs tests separately, summary now
  shows "8 assertions across 7 tests" instead of misleading "8/7"
* Add missing `rm -f` for stale JSON artifact before Test 3
* Split stable vs experimental runtimes: `test all` now runs only
  green runtimes (6/6); use `test experimental` for WIP targets
* Update `--` separator comment: document per-runtime behavior
  (wasmtime=no, wasmer/wasix/libriscv/ckb-vm=yes)
* Add `wasix-build` to `image_name_for` (was missing mapping)
* Remove WASIX build from default `build all` path (unused by runners)
* Fix `xargs -r` (GNU-only) with portable container check
* Fix extra space in wasmedge `--dir` command
* Update help text with all targets including experimental

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants