Adds row-based compaction eligibility filtering and a dry run API for overlord compaction#19179
Open
cecemei wants to merge 17 commits intoapache:masterfrom
Open
Adds row-based compaction eligibility filtering and a dry run API for overlord compaction#19179cecemei wants to merge 17 commits intoapache:masterfrom
cecemei wants to merge 17 commits intoapache:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR makes two main enhancements to Druid's compaction system:
MostFragmentedIntervalFirstPolicywith row count analysis to complement the existing byte-based filteringAdded row count tracking to CompactionStatistics
totalRowsfield (nullableLong) toCompactionStatisticsto track row counts alongside bytesnullfor old segments where this information was not storedincrement/decrement) to handle null propagation correctlyAdded row-based eligibility check to MostFragmentedIntervalFirstPolicy
minUncompactedRowsPercentForFullCompactionparameter similar to the existingminUncompactedBytesPercentForFullCompactionCreated CompactionStatusDetailedStats for dry run mode
Added dryRun API endpoint
dryRunWithConfig()method toCompactionSchedulerinterface/dryRunPOST endpoint toOverlordCompactionResourcethat accepts aClusterCompactionConfigCompactionStatusDetailedStatsshowing what would be compacted without actually submitting jobssimulateRunWithConfigUpdate()in favor of the new dry run APIUpdated CompactionStatusTracker for dry run support
CompactionStatusDetailedStatsfield to track detailed statistics during dry runrecordPendingTask()method to record tasks that could be allocated without submissionRelease note
Added row-based compaction eligibility filtering to
MostFragmentedIntervalFirstPolicyvia theminUncompactedRowsPercentForFullCompactionconfiguration parameter. This complements the existing byte-based filtering and provides more flexibility in determining which intervals need full compaction.Added a dry run API (
/druid/indexer/v1/compaction/dryRun) to the Overlord that shows what compaction jobs would be submitted for a given cluster compaction configuration without actually submitting them. This helps operators preview compaction behavior and validate configuration changes. The API returns detailed statistics including segment counts, bytes, and rows for candidates in different compaction states (pending, skipped, complete, running).This PR has: