[opt](memory) lazy-allocate PrefetchBuffer backing buffer to reduce peak memory#61482
Open
sollhui wants to merge 1 commit intoapache:masterfrom
Open
[opt](memory) lazy-allocate PrefetchBuffer backing buffer to reduce peak memory#61482sollhui wants to merge 1 commit intoapache:masterfrom
sollhui wants to merge 1 commit intoapache:masterfrom
Conversation
Contributor
Author
|
run buildall |
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
TPC-H: Total hot run time: 27313 ms |
TPC-DS: Total hot run time: 168599 ms |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When doing a TVF scan over many small S3/HDFS files, each

CsvReader::init_reader()creates a
PrefetchBufferedReader, which in its constructor immediately allocatesbuffer_num(typically 4)PrefetchBufferobjects, each pre-allocating as_max_pre_buffer_size(4 MB) backing buffer. This costs 16 MB per file readerat construction time, regardless of whether the reader ever performs any I/O.
Fix
Defer the allocation of
_buffrom thePrefetchBufferconstructor to the firsttime
prefetch_buffer()actually runs. This is safe because:_bufis only written byprefetch_buffer()(one writer).read_buffer()only accesses_bufafter waiting on the condition variable forPREFETCHEDstatus, which provides the required happens-before guarantee.Impact
Comparison of total load memory (the earlier is before optimization):
