[feature](be) Add adaptive batch size for pipeline operators#62975
Open
mrhhsg wants to merge 4 commits intoapache:masterfrom
Open
[feature](be) Add adaptive batch size for pipeline operators#62975mrhhsg wants to merge 4 commits intoapache:masterfrom
mrhhsg wants to merge 4 commits intoapache:masterfrom
Conversation
Issue Number: None Related PR: None Problem Summary: Remove unused reader context and generic reader fields in the current staged changes so the reader path stays aligned with the current output-column and batch-size handling. None - Test: No need to test (commit current staged tracked changes only) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Extend adaptive batch size from the scan path to the remaining pipeline operators, including join, aggregation, exchange, union, table function, and sort outputs.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Unit Test: ./run-be-ut.sh --run --filter='AggOperatorTestWithOutGroupBy.*:AggOperatorTestWithGroupBy.*:DistinctStreamingAggOperatorTest.*:ExchangeSourceOperatorXTest.*:HashJoinProbeOperatorTest.*:IntersectOperatorTest.*:ExceptOperatorTest.*:StreamingAggOperatorTest.*:TableFunctionOperatorTest.*:UnnestTest.*:UnionOperatorTest.*:FullSorterTest.*:PartitionSorterTest.*:SortMergerTest.*:MergeSorterStateTest.*'
- Behavior changed: Yes (adaptive batch sizing now applies to more pipeline operators)
- Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
…rators ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Address review feedback on the adaptive batch size feature (commit 908ce1d): 1. ProcessHashTableProbe::_init_probe_side: skip the build-side bytes-per-row contribution for left-semi/anti joins (which only output probe columns) and skip probe-side bytes for right-semi/anti joins. Without this filtering the first-batch row count is under-estimated and the emitted block is smaller than _block_max_bytes allows. 2. VSortedRunMerger::get_next: clarify in a comment that the cursor is intentionally left in the priority queue on the partial-slice path; the shared MergeSortBlockCursor impl ensures next() updates the queue's view in place. 3. BlockSerializer::next_serialized_block: document that _budget (target output size) and _buffer_mem_limit (back-pressure cap from Channel::set_buffer_mem_limit) intentionally coexist. 4. NestedLoopJoinProbeLocalState::_finalize_current_phase: rename the misleading 'column_size' local (which actually holds the dst column row count) to 'current_row_count'. ### Release note None ### Check List (For Author) - Test: No need to test (comment/rename refactor and behavior-preserving estimation tweak; existing operator/sort UTs still cover the affected code paths) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve? Issue Number: N/A Related PR: follow-up to commit ff7609e Problem Summary: Add unit-test coverage for the review-driven changes to the adaptive-batch-size paths: - HashJoinProbeOperatorTest.LeftSemiJoinWithAdaptiveBatchSize and RightSemiJoinWithAdaptiveBatchSize exercise ProcessHashTableProbe::_init_probe_side under a tight preferred block size to validate that the per-output-side filtering of the bytes-per-row pre-estimate (build-side excluded for LEFT_SEMI; probe-side excluded for RIGHT_SEMI) still yields correct results. - BlockSerializerTest covers the dual-threshold logic in BlockSerializer::next_serialized_block: byte budget breakout, EOS forcing serialization, and no-trigger leaving the block buffered. ### Release note None ### Check List (For Author) - Test: Unit Test - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Extend adaptive batch size from the scan path to the remaining pipeline operators, including join, aggregation, exchange, union, table function, and sort outputs.
Release note
None
Check List (For Author)