HDDS-14834. Fix race condition between DeadNodeHandler and HealthyReadOnlyNodeHandler on NetworkTopology#9926
HDDS-14834. Fix race condition between DeadNodeHandler and HealthyReadOnlyNodeHandler on NetworkTopology#9926ivandika3 wants to merge 6 commits intoapache:masterfrom
Conversation
…dOnlyNodeHandler on NetworkTopology DeadNodeHandler and HealthyReadOnlyNodeHandler run on separate SingleThreadExecutors, which can lead to a race condition where a resurrected datanode is removed from the NetworkTopology after being re-added. This leaves the node reachable but invisible to the placement policy. Fix: DeadNodeHandler now checks the current node state before removing it from the topology, skipping removal if the node is no longer DEAD. HealthyReadOnlyNodeHandler uses unconditional add (idempotent) instead of a contains-then-add check, closing the TOCTOU gap. Made-with: Cursor
| // Only remove from topology if the node is still DEAD. Between the time | ||
| // the DEAD_NODE event was fired and now, the node may have been | ||
| // resurrected (DEAD -> HEALTHY_READONLY) via a heartbeat. Removing a | ||
| // resurrected node from the topology would leave it reachable but | ||
| // invisible to the placement policy. | ||
| NodeStatus currentStatus = | ||
| nodeManager.getNodeStatus(datanodeDetails); | ||
| if (currentStatus.getHealth() == HddsProtos.NodeState.DEAD) { |
There was a problem hiding this comment.
@ivandika3 I was thinking would it make sense to add an early check at the start of onMessage and return if the node is no longer DEAD? In the race where the node is resurrected before this handler runs, we’d still run removeContainerReplicas, REPLICATION_MANAGER_NOTIFY, deletedBlockLog.onDatanodeDead, etc, which may not be appropriate for a resurrected node.
There was a problem hiding this comment.
Good point, I added another check at the start. Technically, we need to do each check before doing any of these actions, but seems to be overkill.
priyeshkaratha
left a comment
There was a problem hiding this comment.
Changes overall LGTM. Please check few minor comments.
| NetworkTopology nt = nodeManager.getClusterNetworkTopologyMap(); | ||
| if (nt.contains(datanodeDetails)) { | ||
| nt.remove(datanodeDetails); | ||
| Preconditions.checkState( |
There was a problem hiding this comment.
The call to nodeManager.getNode(datanodeDetails.getID()) could return null if the node is concurrently removed from the NodeManager while this handler is executing. This would lead to a NullPointerException when .getParent() is called, which could terminate the event handler thread.
better to handle like below
DatanodeDetails node = nodeManager.getNode(datanodeDetails.getID());
if (node != null) {
Preconditions.checkState(node.getParent() == null);
}
There was a problem hiding this comment.
Thanks, should not be introduced in this patch, but updated to prevent null.
Anyway, since SingleThreadExecutor#onEvent catches all exception, it should not cause event handler thread termination.
...dds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/HealthyReadOnlyNodeHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Thanks @ivandika3 for updating patch, it looks almost good. Just one minor change to do.
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DeadNodeHandler.java
Outdated
Show resolved
Hide resolved
priyeshkaratha
left a comment
There was a problem hiding this comment.
Thanks @ivandika3 for updating patch. Changes LGTM
What changes were proposed in this pull request?
DeadNodeHandler and HealthyReadOnlyNodeHandler run on separate SingleThreadExecutors, which can lead to a race condition where a resurrected datanode is removed from the NetworkTopology after being re-added. This leaves the node reachable but invisible to the placement policy.
Fix: DeadNodeHandler now checks the current node state before removing it from the topology, skipping removal if the node is no longer DEAD. HealthyReadOnlyNodeHandler uses unconditional add (idempotent) instead of a contains-then-add check, closing the TOCTOU gap.
Made-with: Cursor
There is still a very small chance that race condition might still happen since there is no synchronization method (i.e. lock), but the chance is reduced compared to the previous implementation.
Alternative considered approaches
SingleThreadExecutorfor bothDeadNodeHandler: This requires a large change in the SCM eventframework and might delay event processing
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14834
How was this patch tested?
UT (Clean CI: https://github.com/ivandika3/ozone/actions/runs/23047544006)