Skip to content

fix: adjust diskNormalized strategy to scale cost exponentially with disk utilization#19422

Open
jtuglu1 wants to merge 1 commit intoapache:masterfrom
jtuglu1:fix-disk-normalized-strategy
Open

fix: adjust diskNormalized strategy to scale cost exponentially with disk utilization#19422
jtuglu1 wants to merge 1 commit intoapache:masterfrom
jtuglu1:fix-disk-normalized-strategy

Conversation

@jtuglu1
Copy link
Copy Markdown
Contributor

@jtuglu1 jtuglu1 commented May 6, 2026

Description

The existing linear penalization factor is still ineffective in large skew scenarios where the CostBalancerStrategy's cost forces a move/load (even with the utilization-based penalty). This switches the penalty to scale exponentially with the disk utilization, ensuring that near-full historicals are penalized.

Release note

Adjust diskNormalized strategy to scale cost exponentially with disk utilization


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 force-pushed the fix-disk-normalized-strategy branch from b3521f9 to 9064f7e Compare May 6, 2026 19:22
@jtuglu1 jtuglu1 force-pushed the fix-disk-normalized-strategy branch from 9064f7e to fe0d0d3 Compare May 6, 2026 19:28
Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 1
P2 1
P3 0
Total 2

This is an automated review by Codex GPT-5

double normalizedCost = cost * usageRatio;
final double usageRatio = (double) server.getSizeUsed() / maxSize;
final double headroom = Math.max(EPSILON, 1.0 - usageRatio);
double normalizedCost = cost / headroom;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Existing threshold test now fails

Changing normalization to cost / headroom makes the existing testThresholdBlocksMarginalMove scenario choose DEST: source is roughly 38K / 0.20 * 0.95 = 180.5K, while dest is 40K / 0.26 = 153.8K. The test still asserts null, so the server test suite should fail unless the threshold scenario or algorithm is adjusted.

* A {@link BalancerStrategy} which normalizes the cost of placing a segment on a
* server as calculated by {@link CostBalancerStrategy} by multiplying it by the
* server's disk usage ratio.
* server as calculated by {@link CostBalancerStrategy} by dividing by the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Public docs still describe the old formula

The implementation and Javadoc now divide by available headroom, but docs/design/coordinator.md and docs/configuration/index.md still say diskNormalized multiplies cost by diskUsed / maxSize. That leaves user-facing behavior documentation incorrect for this config option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants