adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory by def- · Pull Request #35036 · MaterializeInc/materialize

def- · 2026-02-16T22:40:09Z

Tried with 100 million rows in https://github.com/def-/ClickBench/tree/pr-materialize/materialize. Before this PR it took 101 min, would go OoM if the files were not split, and have query errors because of results being too large. With this PR the 100 million row ingestion runs in 4:38 min on my dev server (8 cores), should scale pretty linearly with the number of cores. For reference COPY WITH FREEZE in PostgreSQL takes 20 min.

Example test run of the new benchmark in CI with 10 million rows: https://buildkite.com/materialize/release-qualification/builds/1089#019c68b4-4b3f-424f-b3c4-05518e95f1a0

NAME                                | TYPE            |      THIS       |      OTHER      |  UNIT  | THRESHOLD  |  Regression?  | 'THIS' is
--------------------------------------------------------------------------------------------------------------------------------------------------------
CopyFromStdin                       | wallclock       |           3.599 |         101.877 |   s    |    10%     |      no       | better: 28.3 times faster
CopyFromStdin                       | memory_mz       |        1310.349 |        1197.815 |   MB   |    20%     |      no       | worse:   9.4% more
CopyFromStdin                       | memory_clusterd |          28.458 |          28.534 |   MB   |    50%     |      no       | better:  0.3% less

Run in environmentd spec sheet doesn't scale well in Cloud (should be investigated!), but scales well locally:

cores    local    cloud
    1  178.20s  378.87s
    2   89.73s  170.18s
    4   44.79s  144.72s
    8   25.05s  158.65s
   16   20.72s  119.59s
   32      N/A  117.71s

Fixes: https://github.com/MaterializeInc/database-issues/issues/7674

Motivation:
COPY FROM STDIN has been slow for workload replay and testing with large amounts of data in general, also been annoying in https://github.com/MaterializeInc/database-issues/issues/7674 and recently https://materializeinc.slack.com/archives/C08A62E0751/p1770835109967349

Previously failed: materialize=> alter system set max_copy_from_size=15000000000; ERROR: parameter "max_copy_from_size" requires a "unsigned integer" value

github-actions · 2026-02-16T22:40:17Z

Pre-merge checklist

The PR title is descriptive and will make sense in the git log.
This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

Noticed in https://buildkite.com/materialize/release-qualification/builds/1097#019c6b5c-3f94-4d17-b102-c78c9ce9d323

session variables: Allow max_copy_from_size > 4 GB

b344016

Previously failed: materialize=> alter system set max_copy_from_size=15000000000; ERROR: parameter "max_copy_from_size" requires a "unsigned integer" value

def- force-pushed the pr-optimize-copy-from-stdin branch 8 times, most recently from 4e1af8e to 76a31a3 Compare February 17, 2026 11:55

def- changed the title ~~adapter: Optimize COPY FROM STDIN to use constant memory & parallelize it~~ adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory Feb 17, 2026

def- force-pushed the pr-optimize-copy-from-stdin branch 2 times, most recently from 86fc97d to d5b109c Compare February 17, 2026 13:28

def- added 5 commits February 17, 2026 14:14

adapter: Optimize COPY FROM STDIN to use constant memory and parallelize

1a08345

feature-benchmark: Add COPY FROM STDIN scenario

3be4ae8

cluster-spec-sheet: Add copy-from-stdin scenario

0f1517c

testdrive: Fix fivetran-destination.td

a3a811a

fix unit tests for copy from stdin

d42125b

def- force-pushed the pr-optimize-copy-from-stdin branch from d5b109c to 0dd54e9 Compare February 17, 2026 14:29

def- marked this pull request as ready for review February 17, 2026 14:34

def- requested review from a team as code owners February 17, 2026 14:34

def- requested a review from aljoscha February 17, 2026 14:34

cluster-spec-sheet: Run against current version

862818c

Noticed in https://buildkite.com/materialize/release-qualification/builds/1097#019c6b5c-3f94-4d17-b102-c78c9ce9d323

def- force-pushed the pr-optimize-copy-from-stdin branch from 0dd54e9 to 862818c Compare February 17, 2026 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory#35036

adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory#35036
def- wants to merge 7 commits intoMaterializeInc:mainfrom
def-:pr-optimize-copy-from-stdin

def- commented Feb 16, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

def- commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 16, 2026

Pre-merge checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

def- commented Feb 16, 2026 •

edited

Loading