perf(cubeorchestrator): Migrate QueryResult to columnar storage#10829
perf(cubeorchestrator): Migrate QueryResult to columnar storage#10829ovr wants to merge 5 commits into
Conversation
|
Claude finished @ovr's task in 2m 16s —— View job Code Review: PR #10829 — Migrate QueryResult to columnar storage
Overall AssessmentClean, well-motivated performance refactor. The column-major layout eliminates the per-call re-pivot in the Columnar transform path, and the benchmark numbers speak for themselves (up to -74.6% on columnar transforms). The code is mechanically correct, the wire format is unchanged, and the new benchmarks add good coverage for the ingestion paths. A few observations below, none blocking. 1. Unchecked indexing in the JS bridgeFile: let value = &result.data[col_idx][row_idx];This is the only call-site that uses direct indexing into the columnar data. All transform code consistently uses the safe In practice this is safe because the two constructors ( 2.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10829 +/- ##
==========================================
- Coverage 78.84% 78.84% -0.01%
==========================================
Files 470 470
Lines 92274 92274
Branches 3433 3433
==========================================
- Hits 72755 72750 -5
- Misses 19016 19021 +5
Partials 503 503
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
QueryResult now stores data column-major (Vec<Vec<DBResponseValue>>, one buffer per column + row_count) instead of row-major (Vec<Vec<DBResponseValue>>, one Vec per row). The Compact, Columnar, and Vanilla transform branches and the JS getCubestoreResult bridge walk the new layout directly. Wire format (TransformedData JSON) and the JS API shape are unchanged. Why: the Columnar branch had to re-pivot the row-major matrix on every call; with column-major storage that pivot disappears. Per-row Vec headers are also gone (24 B × row_count saved before any data). Adds benches for QueryResult::from_cubestore_fb and QueryResult::from_js_raw_data alongside the existing transform benches. Median time delta vs master, lower = faster: `TransformedData::transform/columnar` | cols / rows | 1 000 | 10 000 | 50 000 | 100 000 | | --- | --- | --- | --- | --- | | 8 | -18.0% | -23.0% | -47.9% | -66.3% | | 16 | -17.9% | -20.6% | -72.0% | -69.6% | | 32 | -15.7% | -42.0% | -68.7% | -65.4% | | 64 | -14.4% | -59.6% | -74.6% | -61.7% | `QueryResult::from_cubestore_fb` | cols / rows | 1 000 | 10 000 | 50 000 | 100 000 | | --- | --- | --- | --- | --- | | 8 | -9.3% | -8.4% | -11.0% | -9.2% | | 16 | -7.9% | -12.3% | -11.7% | -8.7% | | 32 | -9.4% | -4.2% | -3.2% | +9.4% | | 64 | -2.0% | -3.1% | +3.0% | +8.0% | `QueryResult::from_js_raw_data` | cols / rows | 1 000 | 10 000 | 50 000 | 100 000 | | --- | --- | --- | --- | --- | | 8 | -10.1% | -9.0% | -7.3% | -11.0% | | 16 | -2.2% | +3.8% | -2.3% | -0.8% | | 32 | -1.8% | -1.3% | +4.7% | +13.6% | | 64 | -1.1% | -4.1% | -0.8% | +17.3% |
74c0774 to
e79fc32
Compare
doing `db_data.data.get(col_idx).and_then(|c| c.get(row_idx))` per cell — two bounds-checks where the row-major version did one. Resolve `db_data.data[col_idx]` once per request inside `build_compact_plan` and `build_vanilla_plan` so plan entries hold `col: &[DBResponseValue]` directly. The per-cell hot path drops to a single `col.get(row_idx)`, matching the row-major cost.
QueryResult now stores data column-major (Vec<Vec>, one buffer per column + row_count) instead of row-major (Vec<Vec>, one Vec per row). The Compact, Columnar, and Vanilla transform branches and the JS getCubestoreResult bridge walk the new layout directly. Wire format (TransformedData JSON) and the JS API shape are unchanged.
Why: the Columnar branch had to re-pivot the row-major matrix on every call; with column-major storage that pivot disappears. Per-row Vec headers are also gone (24 B × row_count saved before any data).
Adds benches for QueryResult::from_cubestore_fb and QueryResult::from_js_raw_data alongside the existing transform benches.
Median time delta vs master, lower = faster:
TransformedData::transform/columnar| cols / rows | 1 000 | 10 000 | 50 000 | 100 000 | | --- | --- | --- | --- | --- |
| 8 | -18.0% | -23.0% | -47.9% | -66.3% |
| 16 | -17.9% | -20.6% | -72.0% | -69.6% |
| 32 | -15.7% | -42.0% | -68.7% | -65.4% |
| 64 | -14.4% | -59.6% | -74.6% | -61.7% |
QueryResult::from_cubestore_fb| cols / rows | 1 000 | 10 000 | 50 000 | 100 000 | | --- | --- | --- | --- | --- |
| 8 | -9.3% | -8.4% | -11.0% | -9.2% |
| 16 | -7.9% | -12.3% | -11.7% | -8.7% |
| 32 | -9.4% | -4.2% | -3.2% | +9.4% |
| 64 | -2.0% | -3.1% | +3.0% | +8.0% |
QueryResult::from_js_raw_data| cols / rows | 1 000 | 10 000 | 50 000 | 100 000 | | --- | --- | --- | --- | --- |
| 8 | -10.1% | -9.0% | -7.3% | -11.0% |
| 16 | -2.2% | +3.8% | -2.3% | -0.8% |
| 32 | -1.8% | -1.3% | +4.7% | +13.6% |
| 64 | -1.1% | -4.1% | -0.8% | +17.3% |
-->