Skip to content

HDDS-15150. Container scanner should not mark container as UNHEALTHY when FD exhausted#10214

Open
sarvekshayr wants to merge 1 commit intoapache:masterfrom
sarvekshayr:HDDS-15150
Open

HDDS-15150. Container scanner should not mark container as UNHEALTHY when FD exhausted#10214
sarvekshayr wants to merge 1 commit intoapache:masterfrom
sarvekshayr:HDDS-15150

Conversation

@sarvekshayr
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Fixed a bug where background container scanners marked containers as UNHEALTHY due to resource issues rather than actual data corruption. Specifically, when the system encountered a FileNotFoundException or FileSystemException caused by file-descriptor exhaustion ("Too many open files"), the scanners incorrectly flagged these as corruption events.

The logic has been updated to explicitly catch these resource-related exceptions, ensuring that containers remain in their current state when the scanner cannot perform its check due to system limits.

ContainerMetadataScanner

2026-04-20 22:01:43,978 ERROR [ContainerMetadataScanner]-org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerMetadataScanner: Corruption detected in container [3980819]. Marking it UNHEALTHY.
java.io.FileNotFoundException: /data6/hadoop-ozone/datanode/data/hdds/CID-637fe7c5-f40b-4e49-98b3-52154bd669e2/current/containerDir95/3980819/metadata/3980819.container (Too many open files)

ContainerDataScanner

2026-04-20 22:01:43,982 ERROR [ContainerDataScanner(/data12/hadoop-ozone/datanode/data/hdds)]-org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerDataScanner: Corruption detected in container [16326340]. Marking it UNHEALTHY.
java.nio.file.FileSystemException: /data12/hadoop-ozone/datanode/data/hdds/CID-637fe7c5-f40b-4e49-98b3-52154bd669e2/current/containerDir143/16326340/chunks/115816904944438982.block: Too many open files

What is the link to the Apache JIRA

HDDS-15150

How was this patch tested?

Added unit tests in TestBackgroundContainerDataScanner and TestBackgroundContainerMetadataScanner.
Verified that with fix, containers are not incorrectly marked as UNHEALTHY.

@sarvekshayr sarvekshayr changed the title HDDS-15150. Datanode scanner should not mark container as UNHEALTHY when FD exhausted HDDS-15150. Container scanner should not mark container as UNHEALTHY when FD exhausted May 8, 2026
@sarvekshayr sarvekshayr requested a review from ChenSammi May 8, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant