Skip to content

Columnar spilling merge batcher#741

Open
frankmcsherry wants to merge 7 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:columnar_merger
Open

Columnar spilling merge batcher#741
frankmcsherry wants to merge 7 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:columnar_merger

Conversation

@frankmcsherry
Copy link
Copy Markdown
Member

@frankmcsherry frankmcsherry commented May 7, 2026

A v0 of a spilling merge batcher for columnar data. Nothing specific to columnar, except that it serializes well and it happens to be off to the side where we can specialize an implementation. Follows the idioms of timely's pager, with copied traits that abstract the stashing and fetching of data. There is probably some deduplication to do between them, but we went with a second implementation here to see if they looked the same, without forcing it. Roughly!

The merging is lower throughput than you might like owing to it using binary merges, which it will need to move beyond. There's also the potential to use compression on the columnar layouts, as .. at least in the columnar_spill example, two of the columns compress pretty well (and macos's compressed memory is quite competitive there).

An example that saturates the CPU (rather than disk) on my laptop, moving ~50GB through the system.

./target/release/examples/columnar_spill --mode spill --workers 4 --times 1 --keys 775000000 --head 10000000 --thresh 50000000 --sample-secs 30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant