HDDS-14106. Add -XX:NewRatio=3 to default GC options for CMS by rnblough · Pull Request #9967 · apache/ozone

rnblough · 2026-03-23T18:56:17Z

What changes were proposed in this pull request?

I propose that the NewRatio value be specified in java options of all Ozone roles by default when -XX:+UseConcMarkSweepGC is set, to solve the long-standing problem of ConcurrentMarkSweep GC always having a tiny Young Generation heap size. The consequence of the tiny Young Generation heap size is ParNew thrashing, and premature object promotion polluting the Old Gen and eventually driving unnecessary full GC. That part of the problem is straightforwardly diagnosable with GC logs and heap dumps, and it has been pretty common in Hadoop deployments generally to address this problem using -XX:NewSize and -XX:MaxNewSize or -Xmn as cluster sizes grew; the fact that there was a consistent underlying driver through JDK ergonomics that can be trivially compensated for is the insight here.

This primarily impacts larger deployments, particularly where lists of millions of objects like keys or containerIDs becomes routine even through internal reporting mechanisms.

This behavior was introduced deliberately in the JDK ergonomics. The earliest complaints about the behavior I encountered are from JDK6: https://bugs.openjdk.org/browse/JDK-6872335

But it looks like it was actually introduced before that, based on this doc describing GC tuning changes for J2SE 5.0: https://docs.oracle.com/javase/1.5.0/docs/guide/vm/gc-ergonomics.html

The choice of -XX:NewRatio=3 instead of the default value of 2 comes down to the observation that Ozone does not require a young generation heap size that is 1/3 of the total heap (among other things, most Ozone deployments have worked fine even with the artificially tiny value), and to the fact that NewRatio will automatically adjust in tandem with heap size adjustments as opposed to something like -Xmn that would need to be recalculated every time or left static and require future manual adjustment.

Impacts to running clusters: I have observed one occasion where configuring a larger Young Generation heap size did result in ParNew collections taking a substantially longer time, on an SCM where -Xmx200g. This was noticeable when looking at the logs, and was detectable in some client interactions, but there were no further impacts. In every prod cluster I have seen where this change has been implemented from ~100g on down, no negative impacts observed at all.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14106

How was this patch tested?

Manual testing, successful deployment in production clusters, build-branch on fork. Two tests failed, but they are not germane to the change and appear to be do to a config issue in the integration (container) setup.
org.apache.hadoop.ozone.container.diskbalancer.TestDefaultContainerChoosingPolicy
org.apache.hadoop.ozone.container.diskbalancer.TestDefaultVolumeChoosingPolicy

yandrey321 · 2026-03-23T21:03:44Z

hadoop-ozone/dist/src/shell/ozone/ozone-functions.sh

      if [[ "$java_major_version" -lt 15 ]]; then
-        OZONE_OPTS="${OZONE_OPTS} -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled"
-        ozone_error "No '-XX:...' jvm parameters are set. Adding safer GC settings '-XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' to the OZONE_OPTS"
+        OZONE_OPTS="${OZONE_OPTS} -XX:+UseConcMarkSweepGC -XX:NewRatio=3 -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled"


what would be the equivalent for G1GC?

Happily there isn't one. The root cause of the problem is an ergonomic detail unique to CMS, from which G1GC does not suffer. This isn't even a bug, it is intended behavior, just from 2004 when there was nothing like modern scale challenges or the individual server resources to meet them.

If we switch to the question of "can G1GC performance be improved with this property" the answer is no as I understand it; G1GC depends on adaptive sizing of the young gen for meeting the pause goals that are set. Any property that fixes the young gen heap size, whether NewRatio, Xmn, or NewSize=MaxNewSize, would stop that mechanism from working. This wouldn't break G1GC, but it would likely mean that we would start consistently exceeding its pause target above some threshold.

adoroszlai · 2026-03-24T10:14:47Z

Thanks @rnblough for the patch, @yandrey321 for the review.

rnblough and others added 2 commits March 23, 2026 13:05

Set NewRatio explicitly with ConcurrentMarkSweep GC

9dc337d

Merge branch 'apache:master' into HDDS-14106

e7a2037

yandrey321 reviewed Mar 23, 2026

View reviewed changes

adoroszlai changed the title ~~HDDS 14106. Set -XX:NewRatio=3 explicitly when -XX:+UseConcMarkSweepGC is set to resolve tiny young gen issue~~ HDDS 14106. Add -XX:NewRatio=3 to default GC options for CMS Mar 24, 2026

adoroszlai changed the title ~~HDDS 14106. Add -XX:NewRatio=3 to default GC options for CMS~~ HDDS-14106. Add -XX:NewRatio=3 to default GC options for CMS Mar 24, 2026

adoroszlai approved these changes Mar 24, 2026

View reviewed changes

adoroszlai merged commit 987e3bc into apache:master Mar 24, 2026
31 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-14106. Add -XX:NewRatio=3 to default GC options for CMS#9967

HDDS-14106. Add -XX:NewRatio=3 to default GC options for CMS#9967
adoroszlai merged 2 commits intoapache:masterfrom
rnblough:HDDS-14106

rnblough commented Mar 23, 2026

Uh oh!

yandrey321 Mar 23, 2026

Uh oh!

rnblough Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

adoroszlai commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rnblough commented Mar 23, 2026

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

yandrey321 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

rnblough Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adoroszlai commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rnblough Mar 23, 2026 •

edited

Loading