Skip to content

Add support for handling scenarios where end time is invalid during RetentionManager run#18148

Open
9aman wants to merge 1 commit intoapache:masterfrom
9aman:retention_manager_improvement_in_case_of_missing_start_end_time
Open

Add support for handling scenarios where end time is invalid during RetentionManager run#18148
9aman wants to merge 1 commit intoapache:masterfrom
9aman:retention_manager_improvement_in_case_of_missing_start_end_time

Conversation

@9aman
Copy link
Copy Markdown
Contributor

@9aman 9aman commented Apr 9, 2026

Summary

  • When segment end time is invalid, the RetentionManager currently skips the segment entirely — it is never deleted regardless of the retention policy. This adds an optional fallback to use segmentZKMetadata.getCreationTime() instead, so segments with missing/invalid end times can still be cleaned up.
  • Gated behind cluster config controller.retentionManager.enableCreationTimeFallback (default false) — no behavior change unless explicitly opted in.
  • Supports dynamic config updates via the existing cluster config change listener — no controller restart needed.

Test plan

  • TimeRetentionStrategyTest#testCreationTimeFallback — unit tests covering: fallback disabled (existing behavior preserved), fallback enabled with valid/recent/invalid/zero creation time, valid end time takes priority over fallback
  • RetentionManagerTest#testCreationTimeFallbackOnChange — verifies dynamic config toggle via onChange()
  • RetentionManagerTest#testRetentionWithInvalidEndTimeAndCreationTimeFallback — end-to-end: segment with invalid end time is deleted when fallback is enabled and creation time exceeds retention

return false; // Incomplete segments don't have final end time and should not be purged
}

return isPurgeable(tableNameWithType, segmentZKMetadata.getSegmentName(), segmentZKMetadata.getEndTimeMs());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the new checks inside this method so that this method can also handle fallback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally kept that super simple.
It just takes in timestamps while this function handles all the complex logic.
The caller can choose to keep things simple using the other function or use this function if a logical interpretation of the ZK metadata is needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that a good idea to leave to caller to understand the difference. Caller can call any method and both method should be consistent.
Passing invalid timestamp in second method behaves differently than the other method.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.02%. Comparing base (2e80bff) to head (d456597).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...troller/helix/core/retention/RetentionManager.java 88.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18148      +/-   ##
============================================
- Coverage     63.04%   63.02%   -0.03%     
  Complexity     1617     1617              
============================================
  Files          3202     3202              
  Lines        194718   194752      +34     
  Branches      30047    30055       +8     
============================================
- Hits         122760   122736      -24     
- Misses        62233    62269      +36     
- Partials       9725     9747      +22     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 62.99% <94.59%> (-0.02%) ⬇️
java-21 63.00% <94.59%> (-0.02%) ⬇️
temurin 63.02% <94.59%> (-0.03%) ⬇️
unittests 63.01% <94.59%> (-0.03%) ⬇️
unittests1 55.54% <ø> (-0.03%) ⬇️
unittests2 33.43% <94.59%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@noob-se7en noob-se7en added configuration Config changes (addition/deletion/change in behavior) enhancement Improvement to existing functionality documentation Improvements or additions to documentation labels Apr 10, 2026
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few high-signal issues; see inline comments.

boolean oldValue = _useCreationTimeFallbackForRetention;

// Validate that the value is a proper boolean string
if (!"true".equalsIgnoreCase(newValue) && !"false".equalsIgnoreCase(newValue)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this cluster config key is deleted, changedConfigs will still contain it but clusterConfigs.get(...) will be null (DefaultClusterConfigChangeHandler explicitly reports deleted keys that way). This branch treats null as invalid and keeps the old value, so removing the override never reverts to the default false until restart. Because this flag gates destructive retention deletion, the current leader can keep purging segments after an operator thinks they disabled the feature. Please handle null explicitly and reset _useCreationTimeFallbackForRetention to the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Config changes (addition/deletion/change in behavior) documentation Improvements or additions to documentation enhancement Improvement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants