Skip to content

[controller] Reduce IdealState footprint in LLC segment commit path#17707

Merged
xiangfu0 merged 1 commit intoapache:masterfrom
xiangfu0:codex/lazy-idealstate-llc-commit
Feb 20, 2026
Merged

[controller] Reduce IdealState footprint in LLC segment commit path#17707
xiangfu0 merged 1 commit intoapache:masterfrom
xiangfu0:codex/lazy-idealstate-llc-commit

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Feb 15, 2026

What

  • avoid eager full IdealState fetch in commitSegmentMetadataInternal
  • move CONSUMING-state validation into the Step-3 IdealState updater (same snapshot used for mutation)
  • make partition-id fallback to IdealState lazy via Supplier so normal commits skip large IS fetch
  • add best-effort cleanup for newly-created segment metadata when Step-3 fails or does not add the segment
  • gate segment movement on whether the new segment is actually present in updated IdealState

Memory Impact Analysis (100 Concurrent Commits)

  • Segment completion group commit is enabled by default (controller.segment.completion.group.commit.enabled=true).
  • With group commit enabled, same-table commits are queued and applied by one running updater per resource queue.
  • Before this change, each commit did an eager full IdealState fetch before Step-3, then Step-3 still did a read + deep clone for update.
  • After this change, the eager prefetch is removed from the common path; Step-3 still does one read + one deep clone per batch/update attempt.
  • Net effect for 100 concurrent commits on the same table (healthy stream metadata path): transient memory amplification shifts from roughly O(concurrency * IS) to O(batches * IS).

Lower-bound example if IdealState is 160MB serialized:

  • Before: about 100 * 160MB (eager reads) + Step-3 clone/read overhead.
  • After: no per-commit eager reads; primarily Step-3 clone/read overhead.
  • Practical implication: transient heap pressure drops by order(s) of magnitude in the common path.

Caveat:

  • If stream partition-id fetch is incomplete/unavailable, Step-2 fallback still fetches IdealState to merge partition metadata, so memory gains are reduced during that fallback window.

Correctness

  • The CONSUMING-state precondition is still enforced, now inside Step-3 updater against the same IdealState snapshot being mutated.
  • Best-effort cleanup handles Step-2 metadata created but not reflected in IdealState when Step-3 fails or no-ops for new segment add.

Test

  • timeout 600 ./mvnw -pl pinot-controller -DskipTests compile
  • timeout 600 ./mvnw -pl pinot-controller -Dtest=PinotLLCRealtimeSegmentManagerTest -Dsurefire.failIfNoSpecifiedTests=false test
  • Added test: testCommitSegmentMetadataSkipsIdealStateFetchWhenPartitionIdsAvailable

Why

For very large realtime tables, loading full IdealState during each commit can amplify controller heap pressure. This change keeps correctness checks while reducing unnecessary large-object materialization on the common commit path.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 15, 2026

Codecov Report

❌ Patch coverage is 78.57143% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.23%. Comparing base (525a8d0) to head (bb667b1).

Files with missing lines Patch % Lines
.../core/realtime/PinotLLCRealtimeSegmentManager.java 78.57% 8 Missing and 7 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17707      +/-   ##
============================================
+ Coverage     63.16%   63.23%   +0.06%     
- Complexity     1499     1502       +3     
============================================
  Files          3181     3181              
  Lines        190816   190854      +38     
  Branches      29164    29172       +8     
============================================
+ Hits         120533   120683     +150     
+ Misses        60890    60787     -103     
+ Partials       9393     9384       -9     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.19% <78.57%> (+0.03%) ⬆️
java-21 63.20% <78.57%> (+29.15%) ⬆️
temurin 63.23% <78.57%> (+0.06%) ⬆️
unittests 63.23% <78.57%> (+0.06%) ⬆️
unittests1 55.61% <ø> (+0.06%) ⬆️
unittests2 34.06% <78.57%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 requested a review from Copilot February 16, 2026 03:34
@xiangfu0 xiangfu0 added the performance Related to performance optimization label Feb 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes memory usage in the LLC (Low-Level Consumer) segment commit path by removing an eager IdealState fetch that was performed before each commit. The optimization introduces lazy loading via Supplier pattern and moves validation into the IdealState updater to work with the same snapshot used for mutations.

Changes:

  • Removed eager IdealState fetch before Step-3, replacing it with lazy Supplier-based loading for partition-id fallback
  • Moved CONSUMING state validation from pre-commit check to inside Step-3 IdealState updater
  • Added best-effort cleanup for orphaned segment metadata when Step-3 fails or doesn't add the segment to IdealState
  • Removed pause state checks from Step-2, relying on Step-3 to handle paused tables/topics

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
PinotLLCRealtimeSegmentManager.java Core implementation changes: removed eager IdealState fetch, moved validation to Step-3, added cleanup logic, and implemented lazy partition-id loading
PinotLLCRealtimeSegmentManagerTest.java Added test to verify IdealState is not fetched when partition IDs are available from stream

Comment on lines +705 to +712
boolean newConsumingSegmentInIdealState = false;
if (newConsumingSegmentName != null) {
newConsumingSegmentInIdealState = idealState.getRecord().getMapFields().containsKey(newConsumingSegmentName);
if (!newConsumingSegmentInIdealState) {
removeSegmentZKMetadataBestEffort(realtimeTableName, newConsumingSegmentName);
newConsumingSegmentName = null;
}
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the table/topic is paused, new segment metadata is created in Step-2 but not added to IdealState in Step-3 (line 1469 passes null when paused). The cleanup logic then removes this orphaned metadata. However, there's a race condition: if the RealtimeSegmentValidationManager or another component reads segment metadata between Step-2 completion and the cleanup (lines 708-709), it will see a segment with IN_PROGRESS status that's not in IdealState. This could trigger unnecessary validation/repair actions or cause confusion in monitoring. While the cleanup mitigates this, consider checking pause state before creating metadata (possibly via the lazy IdealState supplier used in getPartitionIds) to avoid creating metadata that will be immediately discarded.

Copilot uses AI. Check for mistakes.
@krishan1390
Copy link
Copy Markdown
Contributor

@xiangfu0 given group commit is already enabled by default, this PR itself doesn't reduce the memory allocation from roughly O(concurrency * IS) to O(batches * IS) right ?

This PR reduces the memory allocation by half in the happy path right ? This is also a good improvement but just confirming that the description probably needs to be updated.

Copy link
Copy Markdown
Contributor

@krishan1390 krishan1390 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We need to add more tests that verify the fallback path, the cleanup logic, and the paused-table scenario.

@xiangfu0
Copy link
Copy Markdown
Contributor Author

@xiangfu0 given group commit is already enabled by default, this PR itself doesn't reduce the memory allocation from roughly O(concurrency * IS) to O(batches * IS) right ?

This PR reduces the memory allocation by half in the happy path right ? This is also a good improvement but just confirming that the description probably needs to be updated.

The group commit logic takes the updater function, so only in the group commit thread, IS will be cloned and processed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

@xiangfu0
Copy link
Copy Markdown
Contributor Author

Formatting correction for the previous note (shell escaped function names).

Addressed follow-up comments in 899b092f90:

  • Added tests for the missing edge cases:
    • partition-id fallback path (testCommitSegmentMetadataFetchesIdealStateWhenPartitionIdsFallbackNeeded)
    • paused-table cleanup path (testCommitSegmentMetadataCleansUpMetadataWhenTablePaused)
    • non-CONSUMING committing segment cleanup/error path (testCommitSegmentMetadataCleansUpMetadataWhenCommittingSegmentNotConsuming)
  • Updated fake test manager updateIdealStateOnSegmentCompletion(...) to enforce the CONSUMING precondition and pause-aware new-segment add behavior, so tests now exercise the moved validation semantics more faithfully.
  • Applied small nits from review:
    • simplified redundant condition on segment movement
    • reduced cleanup success log level to DEBUG
    • added explicit INFO context when metadata is cleaned because segment was not added to IdealState

I intentionally left the two broader design-tradeoff threads unresolved in this patch (re-introducing pre-Step2 pause checks / early pre-Step1 CONSUMING check), because both approaches require an additional large IdealState read in the hot path and would partially negate the memory-footprint objective of this PR. Happy to follow up with a dedicated tradeoff change if desired.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

idealState =
updateIdealStateForSegments(tableConfig, committingSegmentName, newConsumingSegmentName, instancePartitions);
} catch (RuntimeException e) {
if (newConsumingSegmentName != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(MAJOR) This logic doesn't handle the table paused scenario, where there will be a dangling consuming segment ZK metadata. Please add a test to guard this

@xiangfu0 xiangfu0 force-pushed the codex/lazy-idealstate-llc-commit branch from 899b092 to 5baefa3 Compare February 18, 2026 21:06
@xiangfu0 xiangfu0 force-pushed the codex/lazy-idealstate-llc-commit branch from 5baefa3 to f4a0b39 Compare February 19, 2026 02:52
[controller] Add LLC commit-path edge-case coverage
@xiangfu0 xiangfu0 force-pushed the codex/lazy-idealstate-llc-commit branch from f4a0b39 to bb667b1 Compare February 19, 2026 03:13
@xiangfu0 xiangfu0 merged commit 33bb182 into apache:master Feb 20, 2026
16 checks passed
@xiangfu0 xiangfu0 deleted the codex/lazy-idealstate-llc-commit branch February 20, 2026 00:26
@xiangfu0 xiangfu0 added the real-time Related to realtime table ingestion and serving label Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Related to performance optimization real-time Related to realtime table ingestion and serving

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants