feat: canonical H2O coverage — q6/q8/q9 adapters + engine-only timing + DataFusion memtable fix by ser-vasilich · Pull Request #4 · RayforceDB/rayforce-bench

ser-vasilich · 2026-05-15T20:12:31Z

Summary

62 commits accumulating the canonical H2O (h2oai/db-benchmark) coverage on rayforce-bench: engine-only timing for SQL adapters, rayforce wrappers for q6/q8/q9, fairness fixes across adapters, and dashboard polish.

Headline changes

Engine-only timing across SQL adapters (`20f915a`)

Replace fetchall() / IPC-materialization with server-side draining or CREATE TEMPORARY TABLE patterns so each adapter is timed on engine work only, not Arrow IPC / Python conversion. Affects DuckDB, chDB, DataFusion, QuestDB, TimescaleDB.

Rayforce q6 / q8 / q9 adapters (`a50ab48`, `611bcb3`, `626cd34`, `99ae025`)

q6 (median + stddev by id4,id5): native Column.median() + Column.std() via new engine OP_MEDIAN and existing stddev
q8 (largest 2 v3 by id6): Column.top(2) via engine OP_TOP_N then OP_GROUP_TOPK_ROWFORM (row-form emit, no LIST intermediate)
q9 (pearson² by id2,id4): two-stage adapter — Column.pearson_corr(...) then arithmetic squaring; required because ** 2 at top would block the DAG hash-agg lowering

Engine-side explode for q8 (raze + indexed gather) keeps the timed query in row form (200k rows) — matches DuckDB's ROW_NUMBER OVER PARTITION shape and SQL adapters' default materialization.

DataFusion memtable fix (`eae3261`)

register_csv produced a listing table that re-parsed CSV on every timed query (page cache avoided disk, but parse cost remained). Replaced with register_record_batches after one-shot collect(). Apples-to-apples vs duckdb/chdb/polars/pandas/rayforce which all hold native columnar storage. q4 154→17 ms, q6 312→148 ms, q8 367→262 ms.

Dashboard / framework polish (multiple)

Canonical H2O suite (groupby q1..q10 + canonical-join q1..q5 + sort_single/sort_multi)
Bonus suite (3-key joins, full-row sorts) under separate bench-bonus target
Per-adapter QUERY_STRINGS shown on the compare panel
Scaling sweep with operations panel split into groupby/join/sort
Histogram split fast/heavy
make check — cross-adapter result equivalence at all sizes 10..10m

Bench snapshots

d354496 — refresh after OP_GROUP_TOPK_ROWFORM (PR rayforce#203 merged)
03d1cf4 — refresh after q6 + q10 bypass operators (PR rayforce#204)

Perf snapshot (10M rows, k=100 cardinality, engine-only timing)

query	rayforce	duckdb	datafusion	polars
q1	7 ms	38 ms	19 ms	30 ms
q2	14 ms	56 ms	41 ms	—
q3	38 ms	117 ms	167 ms	—
q4	13 ms	9 ms	18 ms	29 ms
q5	63 ms	115 ms	137 ms	—
q6	75 ms	188 ms	148 ms	236 ms
q7	58 ms	105 ms	145 ms	—
q8	45 ms	162 ms	264 ms	503 ms
q9	66 ms	80 ms	75 ms	405 ms
q10	170 ms	390 ms	420 ms	2018 ms

Rayforce wins 9/10 (q4 within ~4ms of duckdb — small-group mean by id4 shape where shared path dispatch overhead dominates).

feat(types): canonical H2O column wrappers — top/bot/pearson_corr/std/__pow__ rayforce-py#12 — Python column wrappers (top, bot, pearson_corr, std, median, __pow__)
feat: canonical H2O groundwork — top/bot/pearson_corr/pow + 4 fixes rayforce#202 — engine groundwork (top/bot/pearson_corr/pow primitives + 4 fixes) (merged)
perf: canonical H2O q6/q8/q9 — OP_MEDIAN, OP_PEARSON_CORR, OP_TOP_N, OP_GROUP_TOPK_ROWFORM rayforce#203 — engine perf opcodes (OP_MEDIAN, OP_PEARSON_CORR, OP_TOP_N, OP_GROUP_TOPK_ROWFORM) (merged)
perf: canonical H2O q6/q7/q9/q10 — 4 bypass operators (median+std, max+min, pearson, sum+count) rayforce#204 — 4 bypass operators for q6/q7/q9/q10 (median+std, max+min, pearson, sum+count rowform)

Test plan

make check LOCAL=1 → pass — 665/665 comparisons matched polars, 0 NYI (rtol=1e-06, atol=1e-09) across all 7 sizes × all 19 ops × all 6 adapters
make bench LOCAL=1 reproduces the perf numbers above
Reviewer: build with companion branches (RAYFORCE_LOCAL_PATH pointing at rayforce#204 checkout) and re-run make check + make bench

…karounds Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

register_csv produces a listing table that re-parses CSV on every timed query. register_record_batches with the collected batches caches the columnar layout in memory. q4 154→17ms, q6 312→148ms, q8 367→262ms — DataFusion now apples-to-apples with adapters that hold native columnar storage.

q8's natural rayforce shape is 100k rows with LIST<F64>[2] cells — duckdb's ROW_NUMBER() <= 2 SQL emits 200k exploded rows. Timed bench was unfair: rayforce skipped the row-materialisation cost SQL adapters pay for. Move the explode into the timed engine query via raze + indexed gather (vectorised, no per-element lambda) so both sides materialise 200k. q8 163ms (100k rows) → 215ms (200k rows) vs duckdb 198ms — ~apples-to-apples now. Bundles the q9 two-stage adapter form already in the working tree.

run_groupby_q8's fast vectorised explode assumes K=2 everywhere (true for canonical 10m k100, where every id6 group has ≥2 non-null v3). Small check sizes (10..1m) hit groups with K=1 cells; the K=2-uniform formula produces row-count mismatch. Split: timed path keeps the fast formula; materialize() reverts to a per-cell Python explode for correctness across all check sizes.

ser-vasilich and others added 7 commits May 10, 2026 19:30

prototype: rayforce q6/q8/q9 implementations + bonus-join wrapper wor…

a50ab48

…karounds Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bench(rayforce): q8 uses chained API — engine emits row form natively

99ae025

bench(snapshot): refresh after OP_GROUP_TOPK_ROWFORM

d354496

bench(snapshot): refresh after q6 + q10 bypass operators

03d1cf4

ser-vasilich mentioned this pull request May 17, 2026

perf: canonical H2O q6/q7/q9/q10 — 4 bypass operators (median+std, max+min, pearson, sum+count) RayforceDB/rayforce#204

Merged

4 tasks

bench(questdb): periodic ILP flush + 300s commit timeout

f90ee11

singaraiona merged commit 7326193 into RayforceDB:master May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: canonical H2O coverage — q6/q8/q9 adapters + engine-only timing + DataFusion memtable fix#4

feat: canonical H2O coverage — q6/q8/q9 adapters + engine-only timing + DataFusion memtable fix#4
singaraiona merged 8 commits into
RayforceDB:masterfrom
ser-vasilich:prototype

ser-vasilich commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ser-vasilich commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Headline changes

Engine-only timing across SQL adapters (20f915a)

Rayforce q6 / q8 / q9 adapters (a50ab48, 611bcb3, 626cd34, 99ae025)

DataFusion memtable fix (eae3261)

Dashboard / framework polish (multiple)

Bench snapshots

Perf snapshot (10M rows, k=100 cardinality, engine-only timing)

Related

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ser-vasilich commented May 15, 2026 •

edited

Loading

Engine-only timing across SQL adapters (`20f915a`)

Rayforce q6 / q8 / q9 adapters (`a50ab48`, `611bcb3`, `626cd34`, `99ae025`)

DataFusion memtable fix (`eae3261`)