feat: MSQ supervisors using mostFragmentedFirst policy and minor compaction can scale taskCount down for minor compactions by capistrant · Pull Request #19412 · apache/druid

capistrant · 2026-05-05T20:41:27Z

Description

Minor compactions my not require the same parallelism that a full compaction does for a datasource. This change allows the compaction supervisor to add a taskContext parameter minorCompactionTaskPercent that scales maxTaskCount downwards for minor compaction tasks. For example

...
  "taskContext": {
    "maxTaskCount": 100,
    "minorCompactionTaskPercent": 20
  },
...

full compactions will use 100 tasks and minor compactions will use 20.

by default this behavior is disabled and each supervisor who wants to use it must be using the MSQ compaction engine + opt in by modifying minorCompactionTaskPercent explicitly.

Release note

If you are using MSQ task engine with compaction supervisors for automatic compaction plus MostFragmentedFirst policy with minor compactions enabled, you can now set a per supervisor configuration to use fewer MSQ workers for minor compactions compared to full compactions by setting "minorCompactionTaskPercent" to an int between 1 and 99. minor compactions will use FLOOR(2, maxTaskCount * minorCompactionTaskPercent) for minor compaction task parallelism.

Key changed/added classes in this PR

CompactSegments.java

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

+    final DataSegment segment = new DataSegment(
+        DATA_SOURCE,
+        Intervals.of("2024-01-01/2024-01-02"),
+        "v1",
+        null,
+        ImmutableList.of(),
+        ImmutableList.of(),
+        new NumberedShardSpec(0, 1),
+        0,
+        100L
+    );


FrankChen021

Severity	Findings
P0	0
P1	0
P2	1
P3	0
Total	1

This is an automated review by Codex GPT-5

FrankChen021 · 2026-05-06T13:01:02Z

+      return;
+    }
+
+    final int percent = QueryContext.of(context).getInt(


[P2] Validate minorCompactionTaskPercent when accepting the compaction config

The new context key is only parsed and range-checked while creating a minor MSQ compaction task. Supervisor config validation still accepts values like 0, 200, or a non-numeric string, so the API can persist an invalid supervisor and it will later fail job creation repeatedly once a minor compaction candidate appears. Please add validation alongside the existing MSQ maxNumTasks validation paths, including CascadingReindexingTemplate if applicable, so bad configs are rejected before scheduling.

cecemei

i felt having a percent variable is not really necessary, especially we're just trying to multiple two variables to get a task count for minor compaction, this mental math is not really necessary.

what do you think if we just add another minNumTasks?

capistrant · 2026-05-06T16:35:20Z

i felt having a percent variable is not really necessary, especially we're just trying to multiple two variables to get a task count for minor compaction, this mental math is not really necessary.

what do you think if we just add another minNumTasks?

how would this new variable be used?

I choose a percent so someone can just make an estimate and then not maintain it as they change their max task counts for full compaction, they will just keep the same ratio for minor:full when it comes to task counts

FrankChen021

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

This is an automated review by Codex GPT-5

capistrant added 3 commits May 5, 2026 15:27

minor compaction task parallelism scaling for msq compaction supervisors

2332d3f

Default scaling to off

e40c45d

Fix docs

745b17f

capistrant requested a review from cecemei May 5, 2026 20:42

github-actions Bot added the Area - Documentation label May 5, 2026

github-advanced-security AI found potential problems May 5, 2026

View reviewed changes

embedded test

e3ba963

FrankChen021 reviewed May 6, 2026

View reviewed changes

Pull validation forward after code review called it out

9e91fbd

github-actions Bot added the Area - Ingestion label May 6, 2026

cecemei reviewed May 6, 2026

View reviewed changes

FrankChen021 reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MSQ supervisors using mostFragmentedFirst policy and minor compaction can scale taskCount down for minor compactions#19412

feat: MSQ supervisors using mostFragmentedFirst policy and minor compaction can scale taskCount down for minor compactions#19412
capistrant wants to merge 5 commits intoapache:masterfrom
capistrant:minor-compact-task-scaling

capistrant commented May 5, 2026 •

edited

Loading

Uh oh!

FrankChen021 left a comment

Uh oh!

FrankChen021 May 6, 2026

Uh oh!

cecemei left a comment

Uh oh!

capistrant commented May 6, 2026

Uh oh!

FrankChen021 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

capistrant commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Key changed/added classes in this PR

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

FrankChen021 May 6, 2026

Choose a reason for hiding this comment

Uh oh!

cecemei left a comment

Choose a reason for hiding this comment

Uh oh!

capistrant commented May 6, 2026

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

capistrant commented May 5, 2026 •

edited

Loading