Skip to content

[ams] Force ReachableFileCleanup when snapshots exist outside main ancestry#4231

Open
j1wonpark wants to merge 1 commit into
apache:masterfrom
j1wonpark:expire-snapshots-keep-referenced-files
Open

[ams] Force ReachableFileCleanup when snapshots exist outside main ancestry#4231
j1wonpark wants to merge 1 commit into
apache:masterfrom
j1wonpark:expire-snapshots-keep-referenced-files

Conversation

@j1wonpark
Copy link
Copy Markdown
Contributor

@j1wonpark j1wonpark commented May 21, 2026

Why are the changes needed?

When AMS expires snapshots on a single-ref Iceberg table, Iceberg auto-selects IncrementalFileCleanup. That strategy walks the current snapshot's ancestor chain via SnapshotUtil.ancestorIds, and if a parent snapshot is missing from metadata the walk stops silently. Snapshots below the break are then treated as non-ancestors, so the ADDED entries of the manifests they wrote get reverted during cleanup. If such a manifest was superseded by a merge but its data file is still carried over (as EXISTING) in a manifest the current snapshot references, cleanup deletes a file the live table still points at, and reads fail with NotFoundException.

This is the same root cause as apache/iceberg#13568, fixed by apache/iceberg#13614 in Iceberg 1.10.0, where the strategy selector now falls back to ReachableFileCleanup when snapshots exist outside the main ancestry. Iceberg 1.6.x ~ 1.9.x lack that guard and unconditionally pick the incremental strategy for a single ref. Since bumping Iceberg core isn't always possible for existing deployments, this PR backports an equivalent guard at the Amoro layer.

Brief change log

  • IcebergTableMaintainer: before committing expireSnapshots, check hasSnapshotsOutsideMainAncestry() — the same SnapshotUtil.ancestorIds walk the cleanup uses — and force ReachableFileCleanup only in that at-risk state. Healthy tables keep the automatic incremental selection, so there is no cost impact.
  • ReachableFileCleanupBridge (new, in package org.apache.iceberg): RemoveSnapshots#withIncrementalCleanup(boolean) is package-private, so the bridge calls it without reflection.

How was this patch tested?

  • Added a regression test (TestExpireSnapshotsKeepReferencedFiles) that reproduces the carry-over + broken-chain state and asserts referenced data files survive expiration.
  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (no)

… ancestry

Iceberg's auto-selected IncrementalFileCleanup can silently truncate its
ancestor walk when a parent snapshot is missing, deleting data files the
current snapshot still references. Force the safe ReachableFileCleanup only
when snapshots exist outside the main ancestry; healthy tables are unchanged.

Signed-off-by: Jiwon Park <jpark92@outlook.kr>
@github-actions github-actions Bot added the module:ams-server Ams server module label May 21, 2026
@j1wonpark j1wonpark changed the title [optimizer] Force ReachableFileCleanup when snapshots exist outside main ancestry [ams] Force ReachableFileCleanup when snapshots exist outside main ancestry May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant