<fix>[kvm]: defer skip list cleanup on nodeLeft to prevent VM split-brain#3330
<fix>[kvm]: defer skip list cleanup on nodeLeft to prevent VM split-brain#3330ZStack-Robot wants to merge 12 commits into5.5.6from
Conversation
Walkthrough新增与扩展若干同步与追踪逻辑:为 KVM 同步任务添加“孤立虚机跳过”映射与 TTL 管理;在 VM 生命周期关键路径(启动、重启、恢复、从结构创建)后同步 VM 设备地址信息到主机;调整根盘分配尺寸计算以取虚拟/实际尺寸最大值;在 VM 状态机中允许从 Destroying 转为 Stopped;为 ZBS 存储控制器的 HTTP 调用添加可重试的 setTryNext 开关。 Changes
Sequence Diagram(s)(本次修改为若干独立位置的功能增强与状态/逻辑调整,未引入跨 3+ 明确组件的全新顺序控制流,故省略序列图。) 代码审查工作量评估🎯 3 (Moderate) | ⏱️ ~20 分钟 诗篇
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.5)compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.javaComment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java`:
- Around line 524-526: The protected method isVmOrphaned in KvmVmSyncPingTask is
unused and duplicates logic already present via
orphanedVmsToSkip.containsKey(vmUuid) in isVmDoNotNeedToTrace and
shouldSkipMissingVmHandling; remove the isVmOrphaned method to eliminate
redundancy, or if you intend to keep it for future/semantic reasons, add a clear
comment above isVmOrphaned explaining why it is retained (including planned
usage or TODO ticket reference) and mark it `@SuppressWarnings`("unused") if
desired so linters know it is intentional.
🧹 Nitpick comments (2)
plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java (2)
319-352: 周期任务资源管理问题
startOrphanedVmCleanupTask()提交了一个周期任务,但在stop()方法(第 415-418 行)中没有取消该任务。这可能导致组件停止后任务仍在运行。建议保存
PeriodicTask的引用并在stop()中取消:♻️ 建议的修改
在类中添加字段保存任务引用:
private Future<?> orphanedVmCleanupTaskFuture;修改
startOrphanedVmCleanupTask保存返回值:private void startOrphanedVmCleanupTask() { - thdf.submitPeriodicTask(new PeriodicTask() { + orphanedVmCleanupTaskFuture = thdf.submitPeriodicTask(new PeriodicTask() { // ... existing code }); }修改
stop()方法取消任务:`@Override` public boolean stop() { + if (orphanedVmCleanupTaskFuture != null) { + orphanedVmCleanupTaskFuture.cancel(false); + } return true; }
328-328: 建议提取魔法数字为常量周期任务的间隔
60秒直接硬编码在代码中。建议提取为类常量或考虑使其可配置,以便于维护和调整。♻️ 建议的修改
+private static final long ORPHANED_VM_CLEANUP_INTERVAL_SECONDS = 60; + private void startOrphanedVmCleanupTask() { thdf.submitPeriodicTask(new PeriodicTask() { // ... `@Override` public long getInterval() { - return 60; // Check every 60 seconds + return ORPHANED_VM_CLEANUP_INTERVAL_SECONDS; } // ... }); }
| protected boolean isVmOrphaned(String vmUuid) { | ||
| return orphanedVmsToSkip.containsKey(vmUuid); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "isVmOrphaned" --type javaRepository: MatheMatrix/zstack
Length of output: 180
🏁 Script executed:
rg "extends KvmVmSyncPingTask" --type javaRepository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
sed -n '510,535p' plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.javaRepository: MatheMatrix/zstack
Length of output: 1028
移除未使用的 isVmOrphaned 方法
搜索结果确认该 protected 方法在整个代码库中无任何调用,且不存在 KvmVmSyncPingTask 的子类使用该方法。此外,相同逻辑已在 isVmDoNotNeedToTrace 和 shouldSkipMissingVmHandling 方法中直接调用 orphanedVmsToSkip.containsKey(vmUuid),存在冗余。请移除该未使用的方法,或添加注释说明其保留的具体原因(如未来功能规划)。
🤖 Prompt for AI Agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines
524 - 526, The protected method isVmOrphaned in KvmVmSyncPingTask is unused and
duplicates logic already present via orphanedVmsToSkip.containsKey(vmUuid) in
isVmDoNotNeedToTrace and shouldSkipMissingVmHandling; remove the isVmOrphaned
method to eliminate redundancy, or if you intend to keep it for future/semantic
reasons, add a clear comment above isVmOrphaned explaining why it is retained
(including planned usage or TODO ticket reference) and mark it
`@SuppressWarnings`("unused") if desired so linters know it is intentional.
…isk offering Resolves: ZSTAC-74683 Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
b3f652d to
9e241fe
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java`:
- Around line 493-503: This has a TOCTOU race: after reading orphanedAt from
orphanedSkipVms you unconditionally call orphanedSkipVms.remove(vmUuid), which
can delete a newly inserted timestamp from another thread (e.g., nodeLeft). Fix
by performing an atomic conditional remove using orphanedSkipVms.remove(vmUuid,
orphanedAt) (or re-check the current value and only remove if it equals the
previously read orphanedAt) when ORPHAN_TTL_MS has expired; keep the existing
logging and ensure you reference the same orphanedAt and vmUuid values for the
CAS removal.
- Around line 72-76: The hardcoded ORPHAN_TTL_MS in KvmVmSyncPingTask reduces
operational flexibility; replace it with a KVMGlobalConfig entry (e.g.,
kvm.orphanedVmSkipTimeout) and read that value at startup (and/or subscribe to
updates) instead of using the fixed constant. Update KvmVmSyncPingTask to
retrieve the timeout from the new config key and use that value when managing
orphanedSkipVms/expiry checks (keep a sensible default of 10*60*1000 ms if the
config is missing), and ensure the code references ORPHAN_TTL_MS is replaced
with the config-backed field or accessor so the TTL becomes configurable at
runtime.
🧹 Nitpick comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java (1)
517-523: 建议:使用 CAS 移除以保持一致性与
isVmDoNotNeedToTrace类似,迭代器移除不保证原子性。虽然影响较小,但为保持一致性,建议使用remove(key, expectedValue):♻️ 可选改进
while (it.hasNext()) { Map.Entry<String, Long> entry = it.next(); if (now - entry.getValue() >= ORPHAN_TTL_MS) { - it.remove(); + orphanedSkipVms.remove(entry.getKey(), entry.getValue()); logger.info(String.format("cleaned up expired orphaned skip entry for VM[uuid:%s]", entry.getKey())); } }
| // Orphaned skip entries from departed MN nodes. Key=vmUuid, Value=timestamp when orphaned. | ||
| // These VMs remain in skip-trace state for ORPHAN_TTL_MS to avoid false HA triggers | ||
| // when a MN restarts and its in-flight VM operations haven't completed yet. See ZSTAC-80821. | ||
| private final ConcurrentHashMap<String, Long> orphanedSkipVms = new ConcurrentHashMap<>(); | ||
| private static final long ORPHAN_TTL_MS = 10 * 60 * 1000; // 10 minutes |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "KVMGlobalConfig.java" -type fRepository: MatheMatrix/zstack
Length of output: 124
🏁 Script executed:
rg -n "orphanedVmSkipTimeout|orphanedSkipTimeout|ORPHAN.*TIMEOUT" --type java -A 2 -B 2Repository: MatheMatrix/zstack
Length of output: 552
🏁 Script executed:
rg -n "class KVMGlobalConfig" --type java -A 50Repository: MatheMatrix/zstack
Length of output: 6922
🏁 Script executed:
cat ./plugin/kvm/src/main/java/org/zstack/kvm/KVMGlobalConfig.javaRepository: MatheMatrix/zstack
Length of output: 9331
🏁 Script executed:
rg -n "ORPHAN_TTL_MS|orphanedVmSkipTimeout|orphanedSkipTimeout" --type java -B 3 -A 3Repository: MatheMatrix/zstack
Length of output: 4660
建议将硬编码的超时常量改为 KVMGlobalConfig 配置项
当前使用硬编码常量 ORPHAN_TTL_MS = 10 * 60 * 1000 限制了运维灵活性。建议在 KVMGlobalConfig 中添加可配置的超时参数(如 kvm.orphanedVmSkipTimeout),使不同环境可以根据实际需求调整孤立VM的保留时间。
🤖 Prompt for AI Agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines
72 - 76, The hardcoded ORPHAN_TTL_MS in KvmVmSyncPingTask reduces operational
flexibility; replace it with a KVMGlobalConfig entry (e.g.,
kvm.orphanedVmSkipTimeout) and read that value at startup (and/or subscribe to
updates) instead of using the fixed constant. Update KvmVmSyncPingTask to
retrieve the timeout from the new config key and use that value when managing
orphanedSkipVms/expiry checks (keep a sensible default of 10*60*1000 ms if the
config is missing), and ensure the code references ORPHAN_TTL_MS is replaced
with the config-backed field or accessor so the TTL becomes configurable at
runtime.
When anti-split-brain check selects a disconnected MDS node, the HTTP call now times out after 30s instead of 5+ minutes, and automatically retries the next available MDS via tryNext mechanism. Resolves: ZSTAC-80595 Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report the VM as Stopped. Without this transition, the state machine throws an exception and the VM stays stuck in Destroying state forever. Resolves: ZSTAC-80620 Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
Resolves: ZSTAC-82153 Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
add syncVmDeviceInfo when vm is running Resolves: ZSTAC-67275 Change-Id: I746e63786a677676676b6d6265657063666b7677
<fix>[compute]: add syncVmDeviceInfo when vm is running See merge request zstackio/zstack!9197
<fix>[vm]: add Destroying->Stopped state transition See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation See merge request zstackio/zstack!9155
…plit-brain When a management node departs, its VM skip-trace entries were immediately removed. If VMs were still being started by kvmagent, the next VM sync would falsely detect them as Stopped and trigger HA, causing split-brain. Fix: transfer departed MN skip-trace entries to an orphaned set with 10-minute TTL instead of immediate deletion. VMs in the orphaned set remain skip-traced until the TTL expires or they are explicitly continued, preventing false HA triggers during MN restart scenarios. Resolves: ZSTAC-80821 Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
…anup Resolves: ZSTAC-80821 Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
614603d to
de83221
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java (1)
8476-8496: 建议捕获 hostUuid 本地变量以避免并发下日志/目标不一致
当前回调里使用self.getHostUuid(),若宿主在异步期间变化,日志与目标可能不一致。♻️ 可选改进(保持语义不变)
private void syncVmDevicesAddressInfo(String vmUuid) { - if (self.getHostUuid() == null) { + String hostUuid = self.getHostUuid(); + if (hostUuid == null) { return; } SyncVmDeviceInfoMsg msg = new SyncVmDeviceInfoMsg(); msg.setVmInstanceUuid(vmUuid); - msg.setHostUuid(self.getHostUuid()); - bus.makeTargetServiceIdByResourceUuid(msg, HostConstant.SERVICE_ID, msg.getHostUuid()); + msg.setHostUuid(hostUuid); + bus.makeTargetServiceIdByResourceUuid(msg, HostConstant.SERVICE_ID, hostUuid); bus.send(msg, new CloudBusCallBack(msg) { `@Override` public void run(MessageReply reply) { if (!reply.isSuccess()) { logger.warn(String.format("Failed to sync vm device info for vm[uuid:%s], %s", vmUuid, reply.getError())); } else { logger.debug(String.format("Sent SyncVmDeviceInfoMsg for vm[uuid:%s] on host[uuid:%s]", - vmUuid, self.getHostUuid())); + vmUuid, hostUuid)); } } }); }
Summary
Files Changed
KvmVmSyncPingTask.java— orphan 集合 + 定时清理VmTracer.java— shouldSkipMissingVmHandling hookKVMGlobalConfig.java— orphanedVmSkipTimeout 配置项Resolves: ZSTAC-80821
sync from gitlab !9152