Skip to content

adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory#35036

Open
def- wants to merge 7 commits intoMaterializeInc:mainfrom
def-:pr-optimize-copy-from-stdin
Open

adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory#35036
def- wants to merge 7 commits intoMaterializeInc:mainfrom
def-:pr-optimize-copy-from-stdin

Conversation

@def-
Copy link
Contributor

@def- def- commented Feb 16, 2026

Tried with 100 million rows in https://github.com/def-/ClickBench/tree/pr-materialize/materialize. Before this PR it took 101 min, would go OoM if the files were not split, and have query errors because of results being too large. With this PR the 100 million row ingestion runs in 4:38 min on my dev server (8 cores), should scale pretty linearly with the number of cores. For reference COPY WITH FREEZE in PostgreSQL takes 20 min.

Example test run of the new benchmark in CI with 10 million rows: https://buildkite.com/materialize/release-qualification/builds/1089#019c68b4-4b3f-424f-b3c4-05518e95f1a0

NAME                                | TYPE            |      THIS       |      OTHER      |  UNIT  | THRESHOLD  |  Regression?  | 'THIS' is
--------------------------------------------------------------------------------------------------------------------------------------------------------
CopyFromStdin                       | wallclock       |           3.599 |         101.877 |   s    |    10%     |      no       | better: 28.3 times faster
CopyFromStdin                       | memory_mz       |        1310.349 |        1197.815 |   MB   |    20%     |      no       | worse:   9.4% more
CopyFromStdin                       | memory_clusterd |          28.458 |          28.534 |   MB   |    50%     |      no       | better:  0.3% less

Run in environmentd spec sheet doesn't scale well in Cloud (should be investigated!), but scales well locally:

cores    local    cloud
    1  178.20s  378.87s
    2   89.73s  170.18s
    4   44.79s  144.72s
    8   25.05s  158.65s
   16   20.72s  119.59s
   32      N/A  117.71s

Fixes: https://github.com/MaterializeInc/database-issues/issues/7674

Motivation:
COPY FROM STDIN has been slow for workload replay and testing with large amounts of data in general, also been annoying in https://github.com/MaterializeInc/database-issues/issues/7674 and recently https://materializeinc.slack.com/archives/C08A62E0751/p1770835109967349

Previously failed:

materialize=> alter system set max_copy_from_size=15000000000;
ERROR:  parameter "max_copy_from_size" requires a "unsigned integer" value
@github-actions
Copy link

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@def- def- force-pushed the pr-optimize-copy-from-stdin branch 8 times, most recently from 4e1af8e to 76a31a3 Compare February 17, 2026 11:55
@def- def- changed the title adapter: Optimize COPY FROM STDIN to use constant memory & parallelize it adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory Feb 17, 2026
@def- def- force-pushed the pr-optimize-copy-from-stdin branch 2 times, most recently from 86fc97d to d5b109c Compare February 17, 2026 13:28
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from d5b109c to 0dd54e9 Compare February 17, 2026 14:29
@def- def- marked this pull request as ready for review February 17, 2026 14:34
@def- def- requested review from a team as code owners February 17, 2026 14:34
@def- def- requested a review from aljoscha February 17, 2026 14:34
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from 0dd54e9 to 862818c Compare February 17, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments