Skip to content

Fix HdfsTimeoutException not retried in setupBlockReader and EC createBlockReader#77

Open
exmy wants to merge 1 commit into
ClickHouse:masterfrom
exmy:fix-hdfs-timeout
Open

Fix HdfsTimeoutException not retried in setupBlockReader and EC createBlockReader#77
exmy wants to merge 1 commit into
ClickHouse:masterfrom
exmy:fix-hdfs-timeout

Conversation

@exmy
Copy link
Copy Markdown

@exmy exmy commented May 9, 2026

HdfsTimeoutException is a sibling class of HdfsIOException (both extend
HdfsException), not a subclass. In three places where RemoteBlockReader
construction can throw HdfsTimeoutException (via checkResponse() →
readVarint32()), the exception was not caught by the existing
catch(HdfsIOException) blocks, causing the read to fail immediately
without retrying another DataNode.

This affects both replicated files (InputStreamImpl::setupBlockReader)
and Erasure Coded files (StripedInputStreamImpl::createBlockReader).
For EC files, the uncaught timeout causes chunks to be incorrectly
counted as "missing", leading to "Missing blocks, missingChunksNum >
parityBlkNum" errors even when the data is intact but DataNodes are
temporarily unresponsive.

Changes:

  • InputStreamImpl::setupBlockReader (both overloads): catch
    HdfsTimeoutException, add timed-out node to failedNodes, and retry
    another DataNode.
  • StripedInputStreamImpl::createBlockReader: same fix for the EC path.
  • RemoteBlockReader::readPacketHeader: catch HdfsTimeoutException and
    wrap as HdfsIOException with Block/Datanode context, ensuring
    timeouts during data reading are also properly handled by upstream
    retry logic.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


xumingyong seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@zhanglistar
Copy link
Copy Markdown
Contributor

LGTM @alexey-milovidov pls check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants