feat: MSQ supervisors using mostFragmentedFirst policy and minor compaction can scale taskCount down for minor compactions#19412
Conversation
| final DataSegment segment = new DataSegment( | ||
| DATA_SOURCE, | ||
| Intervals.of("2024-01-01/2024-01-02"), | ||
| "v1", | ||
| null, | ||
| ImmutableList.of(), | ||
| ImmutableList.of(), | ||
| new NumberedShardSpec(0, 1), | ||
| 0, | ||
| 100L | ||
| ); |
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 1 |
| P3 | 0 |
| Total | 1 |
This is an automated review by Codex GPT-5
| return; | ||
| } | ||
|
|
||
| final int percent = QueryContext.of(context).getInt( |
There was a problem hiding this comment.
[P2] Validate minorCompactionTaskPercent when accepting the compaction config
The new context key is only parsed and range-checked while creating a minor MSQ compaction task. Supervisor config validation still accepts values like 0, 200, or a non-numeric string, so the API can persist an invalid supervisor and it will later fail job creation repeatedly once a minor compaction candidate appears. Please add validation alongside the existing MSQ maxNumTasks validation paths, including CascadingReindexingTemplate if applicable, so bad configs are rejected before scheduling.
cecemei
left a comment
There was a problem hiding this comment.
i felt having a percent variable is not really necessary, especially we're just trying to multiple two variables to get a task count for minor compaction, this mental math is not really necessary.
what do you think if we just add another minNumTasks?
how would this new variable be used? I choose a percent so someone can just make an estimate and then not maintain it as they change their max task counts for full compaction, they will just keep the same ratio for minor:full when it comes to task counts |
FrankChen021
left a comment
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
This is an automated review by Codex GPT-5
Description
Minor compactions my not require the same parallelism that a full compaction does for a datasource. This change allows the compaction supervisor to add a taskContext parameter
minorCompactionTaskPercentthat scalesmaxTaskCountdownwards for minor compaction tasks. For examplefull compactions will use 100 tasks and minor compactions will use 20.
by default this behavior is disabled and each supervisor who wants to use it must be using the MSQ compaction engine + opt in by modifying
minorCompactionTaskPercentexplicitly.Release note
If you are using MSQ task engine with compaction supervisors for automatic compaction plus MostFragmentedFirst policy with minor compactions enabled, you can now set a per supervisor configuration to use fewer MSQ workers for minor compactions compared to full compactions by setting "minorCompactionTaskPercent" to an int between 1 and 99. minor compactions will use
FLOOR(2, maxTaskCount * minorCompactionTaskPercent)for minor compaction task parallelism.Key changed/added classes in this PR
CompactSegments.javaThis PR has: