[ZCF-2747] strip SDN controller inventory endpoint port#4029
[ZCF-2747] strip SDN controller inventory endpoint port#4029ZStack-Robot wants to merge 117 commits into
Conversation
…emetal2 instance support attaching novlan and vxlan network to baremetal2 instance Resolves/Related: ZSTAC-82781 Change-Id: I736d637a7168656a6c726c6769777a726e616974 (cherry picked from commit 2499826)
Add virtiofs-based model mounting capability for user VMs: - Add VmModelMountVO database table and schema with hostUuid tracking - Add AttachVirtiofsCmd/DetachVirtiofsCmd for KVM agent - Add MountModelCenterCmd with storageUrl field - Add SDK actions for mount/unmount/query operations - Add error codes 10138-10149 Resolves: ZSTAC-83157 Change-Id: I746679736f7a7176646e646d797969766f697a76
Resolves: ZSTAC-83157 Change-Id: I62696a6d667468766a6575656763707374757277
- add dedicated container image pull error code mapping Resolves: ZSTAC-84175 Change-Id: Iabcdef1234567890abcdef1234567890abcdef12
Resolves: ZSTAC-83157 Change-Id: I756c7073707a6468676c70696a7a746870767777
<fix>[conf]: update global-error See merge request zstackio/zstack!9557
Revert the flat userdata hostname fallback so VM creation and reboot do not inject derived hostname metadata into userdata payloads when no hostname is configured. Add CreateRebootVmKeepUserdataContentCase to capture flat userdata apply requests during VM create and reboot, and assert the userdata list remains unchanged while hostname metadata stays absent. Resolves: ZSTAC-84529 Change-Id: I6f6e6b6b796e6e7a73727673626b7569696a6172
Expose requestCpu/requestMemory on the generated PodInventory so SDK callers can deserialize the fields returned by ZQL. Upgrade script adds matching columns on PodVO and backfills existing pods with request=limit for legacy-equivalent behaviour. Resolves: ZSTAC-80103 Change-Id: I5b2a9d4e8c6f3b7a1d0e9c5b4a8f2d7e6c1b3a9f
<feature>[dpu-bm2]: support attaching novlan and vxlan network to baremetal2 instance See merge request zstackio/zstack!9530
fix(network): keep userdata content immutable See merge request zstackio/zstack!9686
… redeploy Resolves: ZSTAC-84446 Change-Id: I8af5e3887a5bad286b43dda00c874c9de999e1cb
<fix>[kvm]: unify TLS cert IP collection to avoid reconnect-triggered redeploy See merge request zstackio/zstack!9683
Resolves: ZSTAC-84751 Change-Id: I84751b1c2d3e4f5a678901234567890abcdef1235
<fix>[aios]: add i18n mappings for fatal image pull error See merge request zstackio/zstack!9670
<fix>[sdk,db]: add PodInventory request fields See merge request zstackio/zstack!9682
…0165
Resolves: ZSTAC-84677
Related: ZSTAC-80103
Two overlapping problems are addressed in the ai module's global error
codes; both root from the same author reusing codes that were already
allocated for other semantics.
(1) ZSTAC-84677: AIModelManagerImpl model-service startup-timeout path
passed the string literal "ORG_ZSTACK_AI_10144" to operr(). That
code is owned by VmModelMountManager (mount path conflict, 5 %s
template); the timeout only supplies 3 formatArgs, so the UI
rendered a garbled "挂载失败:路径..." message for a health-check
timeout.
(2) ZSTAC-80103: error codes 10138-10143 had two conflicting owners:
- AIModelManagerApiInterceptor: CPU / memory request validation
(this is the canonical semantics per the AI error-code spec)
- VmModelMountManager: VM/model not-found, VM-state checks,
cross-account mount, ModelCenter not-found
The mount feature wrote zh_CN i18n templates with mount semantics,
so when the CPU/memory validation actually fired, users saw the
unrelated mount text with %s slots filled by CPU/memory args
(e.g. "虚拟机「1」(UUID: 4) 必须处于运行状态才能挂载模型,当前状态: 4"
for a CPU-below-min error).
This commit (zstack side):
- Add 7 new error-code constants in CloudOperationsErrorCode:
10159 (model service startup timeout)
10160-10165 (relocated mount-feature errors that previously squatted
on 10138-10143)
- Rewrite globalErrorCodeMapping entries for 10138-10143 to the
canonical CPU/memory validation semantics
- Add localized templates for 10159-10165 across all 10 locales
(zh_CN, en_US, zh_TW, ja-JP, ko-KR, de-DE, fr-FR, ru-RU, id-ID,
th-TH) so non-Chinese users see localized messages
The companion premium MR switches:
- AIModelManagerImpl startup-timeout call site to 10159
- VmModelMountManager 6 call sites from 10138-10143 to 10160-10165
Change-Id: I076f8da0b935a47cd4956e039e1e0b05206d085e
<fix>[sdk]: sync OSPF neighbor adjacency fields See merge request zstackio/zstack!9737
Resolves: ZSTAC-84446 Change-Id: I9bed31c0cefddd6ed11f59cd13e36eb1c2abc029
<fix>[conf]: add ORG_ZSTACK_AI_10159 for model service startup timeout See merge request zstackio/zstack!9734
<fix>[kvm]: decouple TLS cert detection from libvirtd restart toggle See merge request zstackio/zstack!9741
PortForwardingConfigProxy.detachNetworkServiceFromNoHaVirtualRouter would generate 'AND vo.uuid IN ()' when called with an empty list, causing a MariaDB syntax error. Add an early return guard. Resolves: ZSTAC-84748 Change-Id: I15a86ffb92829af6952697a4f3cb1351cd5ed28c
<fix>[portforwarding]: skip detach when serviceUuids is empty See merge request zstackio/zstack!9744
… marshal time
Per ZSTAC-84817, malformed JSON in APIAddExternalPrimaryStorage.config slipped
past the API boundary and surfaced as SYS.1000 InternalError from a deep Gson
parse, after the message had already reached the cloud-bus handler.
Fix:
* PrimaryStorageApiInterceptor now validates msg.getConfig() with org.json
(strict — rejects unquoted keys / missing colons that Gson lenient mode
silently tolerates) and throws ApiMessageInterceptionException with
ORG_ZSTACK_STORAGE_PRIMARY_10053 if it isn't a valid JSON object.
* conf/serviceConfig/externalPrimaryStorage.xml wires
PrimaryStorageApiInterceptor at service level so the interceptor actually
runs for APIAddExternalPrimaryStorageMsg (the existing wiring in
primaryStorage.xml does not cover messages declared in this file).
* CloudOperationsErrorCode.ORG_ZSTACK_STORAGE_PRIMARY_10053 constant added.
Test: ZbsPrimaryStorageCase gains
testAddExternalPrimaryStorageWithMalformedJsonRejectedByInterceptor — sends
"{this is not valid json" via API, asserts the call fails AND no
PrimaryStorageVO/ExternalPrimaryStorageVO/PrimaryStorageOutputProtocolRefVO
row leaks. Verified locally: Tests run: 1, Failures: 0, Errors: 0,
Time elapsed: 92.788 sec, BUILD SUCCESS.
Resolves: ZSTAC-84817
Change-Id: Icf6c648133d7866edf35940d56a28f74f4c64817
Resolves: ZSTAC-84839 Change-Id: I7a756e72627a6c67736a72656669677667747067
<fix>[primaryStorage]: validate ExternalPrimaryStorage config JSON at marshal time See merge request zstackio/zstack!9743
<fix>[allocator]: thread HostAllocationPurpose through allocator path See merge request zstackio/zstack!9754
br_conn_all_ns (169.254.64.1, mevoco.py CONNECT_ALL_NETNS_BR_OUTER_IP) is created on the host only when the first flat L3 network lands, AFTER the add-host ansible playbook has generated the TLS cert. The cert SAN therefore lacks this IP, and on every subsequent reconnect the check-tls-certs flow detects the gap and triggers an ansible redeploy plus libvirtd/kvmagent restart, breaking PID stability. This IP is host-internal (used only for VM <-> root-netns userdata / pushgateway / kvmagent HTTP) and has no business in a libvirtd TLS cert. Add EXCLUDED_INTERNAL_IPS as a single source of truth used by both the check flow (buildIpList) and the deploy flow (unionTlsCertIps) so the two flows can never disagree, regardless of when the bridge appears. host_plugin.fact() on the agent side is unchanged. Resolves: ZSTAC-84446 Change-Id: I10821d8c68190f2cbc8a0679d19ed053916d9184
<fix>[kvm]: exclude br_conn_all_ns IP from TLS cert SAN See merge request zstackio/zstack!9758
DBImpact Change-Id: I6d757a76676d6763697161776a61796175737775
…erviceRefVO.isDefault/createDate/lastOpDate Resolves: ZSTAC-84025 Change-Id: Ifbb68c9d475aa19762bb053e6545381a305dfe5b
Resolves: ZCF-3703 Change-Id: I8b002e1c0b7e0535df0803a8107e9b4ebbdadf21
<fix>[vm]: ZSTAC-84158 handle StopVmGC host mismatch See merge request zstackio/zstack!9888
KVM host reconnect can still fail with `Defaults requiretty` because the Java control plane runs some sudo commands through direct SSH, outside the Ansible path. Use a per-call PTY only for those commands so existing `SshShell.runCommand` output behavior stays unchanged. Resolves: ZSTAC-77375 Change-Id: I69756c636d6b65636c69616a7076756869646a63
<fix>[flatNetworkProvider]: refresh metadata after attaching non-default nic See merge request zstackio/zstack!9905
<fix>[query]: ZSTAC-80742 limit API query concurrency See merge request zstackio/zstack!9866
When kvmagent crashes/restarts, MN-side in-flight HTTP wrappers wait until the 5min default timeout before failing, blocking ChainTask queues (e.g. volume backup stuck). Add a drain trigger on the connect reply: when agent reports firstConnect=true, all pending HTTP calls scoped to that host with sentAtMillis < agentStartTime - 60s are failed immediately. - RESTFacade.failPendingCallsForResourceBefore(resourceUuid, cutoffMillis, err): single-pass scan over wrappers, filters by X-Resource-UUID header + send time; calls wrapper.fail(err) (same path as natural timeout, idempotent via called.compareAndSet). - AsyncHttpWrapper exposes getResourceUuid() + getSentAtMillis(); closure stores both at construction. - KVMAgentCommands.ConnectResponse adds firstConnect + agentStartTimeMillis. PingResponse unchanged. - KVMHost.connectToAgent drains stale calls inline when rsp.isFirstConnect() after a successful connect. - New i18n error code ORG_ZSTACK_CORE_REST_10013 (en/zh). - Groovy IT HostAgentRestartDrainCase drives the path via reconnectHost SDK action with KVM_CONNECT_PATH simulator override. Resolves: ZSTAC-58310 Change-Id: I1c2d5e3f7a8b9c0d2e4f5a6b7c8d9e0f1a2b3c4d
Split ZNS DHCP error codes used by premium DHCP paths.\nAdd i18n mappings for the new concrete error templates. Resolves: ZCF-3703 Change-Id: I11bf0fc6df8cf5a5e1ae24af958c5ca3712b4347
<fix>[kvm]: drain stale http calls on kvmagent restart See merge request zstackio/zstack!9904
Move premium-only HA pre-fence details out of community while keeping the generic async before-start extension point support. Jira: ZSTAC-83890 Test: mvn -pl header,compute,plugin/kvm,simulator/simulatorImpl,testlib -DskipTests compile Change-Id: I168adf82338f9df9e76287619b7f76a8e5be695f
Resolves: ZSTAC-84691 Change-Id: I6668736975707575756d7665647867686e6b6b76
Pass VM API system tags into AllocateHostMsg so host allocator filters can honor znsNicMode during VM placement. Resolves: ZCF-3728 Change-Id: I4fbb522440931aad005ceb82a37a29236bd35b27
<fix>[kvm]: allocate tty for selected KVM sudo commands See merge request zstackio/zstack!9909
JIRA: ZCF-3703 Remove the duplicate systemTags field from SdnControllerEnableDHCPMsg. NeedReplyMessage already owns systemTags, so redeclaring it makes Gson fail while CloudBus logs the message before sending it to the ZNS DHCP handler. Change-Id: I4546348c9fd45004e0f9b9c823f9cccec9ad4774
APIImpact Resolves: ZSTAC-84014 Change-Id: I6a7679766a6d666f6966787a757772706f77716a
Persist the resolved NIC driver type during AttachVmNicToVm and narrow prepareDbInitialValue to repair only null VNIC driverType records while respecting the VM virtio tag and platform, preventing existing virtio NICs from being overwritten as e1000 after management node restart. Resolves: ZSTAC-85160 Change-Id: I6868776c140529e5286d40c187fde99608127f7a
<fix>[kvm]: extend reconnect echo timeout after libvirtd restart See merge request zstackio/zstack!9883
<fix>[volume]: ReInitDisk support data Volume See merge request zstackio/zstack!9908
Avoid bad host allocator lambda bytecode. Set KVM capacity sync timeout explicitly. Test: VmwareTest passed on root@172.20.13.240. Resolves: ZCF-3728 Change-Id: I5db26740b9fef0c0ba1b31096a8fdc3659db412b
Avoid NPE when MtuGetter is constructed outside Spring. Resolves: ZCF-3703 Change-Id: I21a88fca05ab157a767d569435c2d37a48d4179b
<fix>[zns]: ZCF-3728 propagate vm system tags See merge request zstackio/zstack!9911
<fix>[ha]: stabilize suspect host pre-fence flow See merge request zstackio/zstack!9885
<fix>[network]: ZCF-3703 add ZNS attach skip hook See merge request zstackio/zstack!9900
Walkthrough此 PR 是一个跨子系统的大型提交:包含 SDN 控制器 inventory 与工厂重构、VM 网卡分配/生命周期扩展、KVM 主机/agent 行为改动、主存数据卷 reinit 流与消息、REST/webhook 客户端与协议、负载均衡健康检查扩展、网络扩展点与数据库迁移、文档与大量测试覆盖。 ChangesMonolithic cross-cutting changes
Estimated code review effort
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@header/src/main/java/org/zstack/header/network/sdncontroller/SdnControllerInventory.java`:
- Around line 62-75: The toHostAddress method currently returns the original
endpoint for many cases (losing the trimmed normalization) and doesn't handle
bracketed IPv6 addresses like [fe80::1]:8080; update to always operate on the
trimmed value, add a branch that detects a bracketed IPv6 host (startsWith('[')
and contains ']' followed by ':' ) and strip the trailing :port after the
closing ']', and keep the existing single-colon host:port logic
(firstColon==lastColon) to remove the port for IPv4/hostname cases; ensure the
method returns the normalized host string in all branches using the trimmed
variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 28096781-e83b-492a-b998-27b35709c2f3
📒 Files selected for processing (1)
header/src/main/java/org/zstack/header/network/sdncontroller/SdnControllerInventory.java
8d3fb7c to
a626ed9
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
docs/modules/network/pages/networkResource/ZnsIntegration.adoc (1)
1-976:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift文档新增内容为中文,违反仓库统一英文规范。
该文件当前几乎全部为中文描述,建议改为英文(标题、正文、注释示例说明等)以满足仓库规范。
As per coding guidelines:
**/*.*:代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/modules/network/pages/networkResource/ZnsIntegration.adoc` around lines 1 - 976, The document is written in Chinese and must be converted to English to comply with repository language rules; please translate all user-facing text (titles like "= 对接ZNS SDN控制器", section headings, paragraphs, NOTE blocks, tables, examples, and inline code comments) into clear, idiomatic English and preserve all technical identifiers and symbols (e.g. Cms struct, Segment, ZnsControllerVO, SdnControllerVO, ZnsTransportZoneVO, L2NetworkVO, APICreateL2NetworkMsg, ZnsApiInterceptor, X-Api-Versions, X-Api-Version, endpoints such as POST /zns/notify/add-compute-manager, OPTIONS /zns/api/v1/segments, and other API paths) as-is; ensure code blocks and JSON examples remain syntactically unchanged except for translated surrounding explanatory text, keep NOTE/NOTE-like admonitions translated, and maintain any RFC/format examples (UUID formats, headers, tables) exactly while replacing Chinese descriptions with English equivalents.plugin/flatNetworkProvider/src/main/java/org/zstack/network/service/flat/FlatDhcpBackend.java (1)
997-1005:⚠️ Potential issue | 🟠 Major | ⚡ Quick win先读已有 tag,再做 capability gating。
现在一进来就按
isAllocateDhcpServerIp(l3Uuid, ipVersion)早退,意味着历史上已经存在的 DHCP server IP tag 也拿不到了。升级后如果 provider 对 IPv6 改为isAllocateDhcpv6ServerIp() == false,这里会把旧的 DHCPv6 地址当成不存在,直接改掉既有行为。更稳妥的顺序是:先返回已有 tag,只有在“当前没有 tag、需要新分配”时才按 capability 返回null。建议修改
`@Deferred` private String allocateDhcpIp(String l3Uuid, int ipVersion, boolean allocate_ip, String requiredIp, String excludedIp) { - if (!isAllocateDhcpServerIp(l3Uuid, ipVersion)) { - return null; - } - Map<String, String> dhcpServerMap = getExistingDhcpServerIp(l3Uuid, ipVersion); if (!dhcpServerMap.isEmpty()) { return dhcpServerMap.keySet().iterator().next(); } + + if (!isAllocateDhcpServerIp(l3Uuid, ipVersion)) { + return null; + }As per coding guidelines "向后兼容原则:之前的代码产生的行为不要直接去改动,即使是 bug 也可能用户有依赖,不应直接改动。"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/flatNetworkProvider/src/main/java/org/zstack/network/service/flat/FlatDhcpBackend.java` around lines 997 - 1005, The current allocateDhcpIp method calls isAllocateDhcpServerIp(l3Uuid, ipVersion) first which prevents returning historically tagged DHCP server IPs; change the logic to first call getExistingDhcpServerIp(l3Uuid, ipVersion) and if that map is non-empty return its key immediately, and only when there is no existing tag consult isAllocateDhcpServerIp(l3Uuid, ipVersion) and return null if capability denies allocation; keep the rest of allocateDhcpIp parameters/flow (allocate_ip, requiredIp, excludedIp) unchanged and ensure this preserves backward compatibility.search/src/main/java/org/zstack/query/QueryFacadeImpl.java (1)
352-361:⚠️ Potential issue | 🟠 Major | ⚡ Quick win并发化后这里会踩到懒加载缓存的竞态。
syncSubmit("api-query")让doQuery()可以并发执行,但autoQueryMap和replySetter仍是查询路径里懒写入的普通HashMap。这两个get/put在高并发下不是线程安全的,轻则重复反射,重则把缓存状态打坏并导致随机查询失败。这里建议改成ConcurrentHashMap + computeIfAbsent,或者在start()里一次性预热这些缓存。Also applies to: 385-415
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@search/src/main/java/org/zstack/query/QueryFacadeImpl.java` around lines 352 - 361, The lazy-populated plain HashMaps used in doQuery (autoQueryMap and replySetter) are not thread-safe under concurrent syncSubmit("api-query") calls; replace their declarations with ConcurrentHashMap and change lazy get/put sites in doQuery (and the similar block at lines ~385-415) to use computeIfAbsent to atomically cache reflections, or alternatively prewarm these maps in start() so they are fully populated before concurrent queries run; update references to autoQueryMap.get/put and replySetter.get/put accordingly to use computeIfAbsent (or remove lazy population if prewarming).storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java (1)
1117-1133:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win失败分支不要提前 reply,且回滚前要判空 UUID
这里的失败分支先
bus.reply(msg, ret),再trigger.fail(...),外层 error handler 会再回一次;而 capability 检查阶段失败时vo.getUuid()还没赋值,后面的rollbackSnapshot(vo.getUuid())会把原始错误覆盖掉。建议只在内层设置错误并
trigger.fail(...),把 reply 统一留给 error handler;同时在 error handler 里先判断vo.getUuid() != null再回滚。💡 建议修改
MessageReply reply = bus.call(askMsg); if (!reply.isSuccess()) { ret.setError(operr(ORG_ZSTACK_STORAGE_SNAPSHOT_10016, reply.getError(), "cannot ask primary storage[uuid:%s] for volume snapshot capability", vol.getUuid())); - bus.reply(msg, ret); trigger.fail(ret.getError()); return; } @@ if (!capability.isSupport()) { ret.setError(operr(ORG_ZSTACK_STORAGE_SNAPSHOT_10017, "primary storage[uuid:%s] doesn't support volume snapshot;" + " cannot create snapshot for volume[uuid:%s]", primaryStorageUuid, vol.getUuid())); - bus.reply(msg, ret); trigger.fail(ret.getError()); return; } @@ error(new FlowErrorHandler(msg) { `@Override` public void handle(ErrorCode errCode, Map data) { - rollbackSnapshot(vo.getUuid()); + if (vo.getUuid() != null) { + rollbackSnapshot(vo.getUuid()); + } ret.setError(errCode); bus.reply(msg, ret); } });Also applies to: 1227-1233
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java` around lines 1117 - 1133, The failure branches inside the AskVolumeSnapshotCapabilityReply handling currently call bus.reply(msg, ret) before trigger.fail(...), and later rollbackSnapshot(vo.getUuid()) can overwrite the original error because vol/vo.getUuid() may be null; remove the inner bus.reply(msg, ret) calls in the capability failure paths (leave only ret.setError(...) and trigger.fail(...)) so replies are handled by the outer error handler, and update the outer error handler to check vo.getUuid() != null before calling rollbackSnapshot(vo.getUuid()); apply the same change to the similar block around the other capability check (lines 1227-1233).rest/src/main/resources/scripts/GoInventory.groovy (1)
1356-1368:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
other_actions.go文件在无方法内容时因未使用的context导入而编译失败。
generateOtherActionsFile()没有条件判断,即使methodsContent为空也会生成文件;新增的context导入缺少占位符引用(如param和view那样的var _ = ...)。当没有孤立的 action 需要生成时,Go 编译会因未使用的导入报错。建议添加占位符引用来消除该导入:
修复建议
content.append("import (\n") content.append("\t\"context\"\n") if (needsFmt) { content.append("\t\"fmt\"\n") } content.append("\t\"github.com/zstackio/zstack-sdk-go-v2/pkg/param\"\n") content.append("\t\"github.com/zstackio/zstack-sdk-go-v2/pkg/view\"\n") content.append(")\n\n") + content.append("var _ = context.Background // avoid unused import\n") content.append("var _ = param.BaseParam{} // avoid unused import\n") content.append("var _ = view.MapView{} // avoid unused import\n\n")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest/src/main/resources/scripts/GoInventory.groovy` around lines 1356 - 1368, The generated Go file always imports "context" even when methodsContent is empty causing unused-import compile errors; in generateOtherActionsFile() (in GoInventory.groovy) either skip adding the "context" import when no methods are emitted or add a placeholder reference like `var _ = context.Background()` (similar to the existing `var _ = param.BaseParam{}` and `var _ = view.MapView{}`) into the generated content so the context import is used when methodsContent is empty.rest/src/main/resources/scripts/GoApiTemplate.groovy (1)
1165-1189:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
generateGenericMethod的 PUT 调用漏传ctx,会生成不可编译的 Go 代码
cli.Put/cli.PutWithRespKey的调用参数在这些分支里仍是旧签名拼接,未带ctx,与本次统一改造后的其他分支不一致。建议修复(示例)
- String putArgs = unwrap ? - """("${cleanPath}", ${paramName}, map[string]interface{}{ + String putArgs = unwrap ? + """(ctx, "${cleanPath}", ${paramName}, map[string]interface{}{ \t\t"${actionKey}": map[string]interface{}{}, \t}, &resp)""" : - """("${cleanPath}", ${paramName}, "", map[string]interface{}{ + """(ctx, "${cleanPath}", ${paramName}, "", map[string]interface{}{ \t\t"${actionKey}": map[string]interface{}{}, \t}, &resp)"""- String putArgs = unwrap ? - """("${cleanPath}", uuid, params, &resp)""" : - """("${cleanPath}", uuid, "", params, &resp)""" + String putArgs = unwrap ? + """(ctx, "${cleanPath}", uuid, params, &resp)""" : + """(ctx, "${cleanPath}", uuid, "", params, &resp)"""Also applies to: 1232-1256, 1263-1287
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest/src/main/resources/scripts/GoApiTemplate.groovy` around lines 1165 - 1189, The PUT calls generated by generateGenericMethod are missing the context parameter: update the putArgs constructions for both branches that use cli.Put and cli.PutWithRespKey so that ctx is passed as the first argument (e.g. change arguments from ("${cleanPath}", ${paramName}, ...) or ("${cleanPath}", ${paramName}, "", ...) to (ctx, "${cleanPath}", ${paramName}, ...) or (ctx, "${cleanPath}", ${paramName}, "", ...)), doing this for both the unwrap and non-unwrap variants in the blocks that build putArgs (including the other occurrences you noted around lines 1232-1256 and 1263-1287) so the generated functions call cli.Put/cli.PutWithRespKey(ctx, ...) and compile correctly.plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java (1)
192-194:⚠️ Potential issue | 🟠 Major | ⚡ Quick win用户名变更判断比较了错误字段
这里把
msg.getUserName()与self.getName()比较,应该与self.getUsername()比较,否则会导致用户名变更判断不正确。建议修复
- if (msg.getUserName() != null && !msg.getUserName().equals(self.getName())) { + if (msg.getUserName() != null && !msg.getUserName().equals(self.getUsername())) { chain.getData().put(SDN_CONTROLLER_USERNAME, self.getName()); self.setUsername(msg.getUserName()); changed = true; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java` around lines 192 - 194, The username-change check in SdnControllerBase incorrectly compares msg.getUserName() to self.getName(); change the comparison to msg.getUserName() != self.getUsername() so the logic tests against the current username, and when storing the previous username into chain.getData() under SDN_CONTROLLER_USERNAME use self.getUsername() (the prior value) before calling self.setUsername(msg.getUserName()); update the block around SDN_CONTROLLER_USERNAME, self.getUsername(), self.setUsername(...), and msg.getUserName() accordingly.
🟠 Major comments (25)
header/src/main/java/org/zstack/header/vm/VmNicInventory.java-43-44 (1)
43-44:⚠️ Potential issue | 🟠 Major | ⚡ Quick win新增
interfaceId后缺少 VO->Inventory 映射,导致字段值可能丢失。
VmNicInventory(VmNicVO vo)当前没有给interfaceId赋值,valueOf()路径构建出的 inventory 可能始终拿不到该字段。建议修复
public VmNicInventory(VmNicVO vo) { this.setUuid(vo.getUuid()); this.setVmInstanceUuid(vo.getVmInstanceUuid()); @@ this.setDriverType(vo.getDriverType()); + this.setInterfaceId(vo.getInterfaceId()); this.setType(vo.getType()); this.setState(vo.getState().toString()); }Also applies to: 228-234
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/vm/VmNicInventory.java` around lines 43 - 44, The new private field interfaceId in VmNicInventory is not populated from the VO; update the VmNicInventory(VmNicVO vo) constructor (and any other factory/path like the static valueOf(VmNicVO) method) to read vo.getInterfaceId() and assign it to this.interfaceId so the inventory retains the value; ensure both the constructor and valueOf mapping consistently set interfaceId from VmNicVO.network/src/main/java/org/zstack/network/l2/L2NetworkExtensionPointEmitter.java-56-68 (1)
56-68:⚠️ Potential issue | 🟠 Major | ⚡ Quick winbeforeUpdate 中吞掉异常会绕过扩展点校验。
Line 63-66 当前仅记录日志并继续执行;在“before”阶段这会让扩展失败后仍可能继续改 VLAN,存在一致性风险。建议把异常继续抛给上层处理。
As per coding guidelines: “在必要时,应继续抛出异常,使上层业务处理者可以转换为用户友好的错误提示。”建议修改
public void beforeUpdate(final L2NetworkInventory inv) { for (L2NetworkUpdateExtensionPoint ext : updateExtensions) { try { ext.beforeChangeL2NetworkVlanId(inv); } catch (RuntimeException e) { - // propagate validation failures and other runtime exceptions immediately throw e; } catch (Exception e) { - logger.warn(String.format("unhandled exception in L2NetworkUpdateExtensionPoint.beforeChangeL2NetworkVlanId of %s", - ext.getClass().getCanonicalName()), e); + throw new RuntimeException(String.format( + "exception in L2NetworkUpdateExtensionPoint.beforeChangeL2NetworkVlanId of %s", + ext.getClass().getCanonicalName()), e); } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@network/src/main/java/org/zstack/network/l2/L2NetworkExtensionPointEmitter.java` around lines 56 - 68, The beforeUpdate method in L2NetworkExtensionPointEmitter swallows non-RuntimeExceptions from each L2NetworkUpdateExtensionPoint.beforeChangeL2NetworkVlanId call, which bypasses extension validation; change the error handling in the for-loop over updateExtensions so that any Exception (not just RuntimeException) is propagated to the caller (e.g., rethrow the caught Exception or wrap and throw a RuntimeException) instead of only logging it, ensuring upstream callers of beforeUpdate receive and can handle the failure.plugin/xinfini/src/main/java/org/zstack/xinfini/XInfiniStorageController.java-279-281 (1)
279-281:⚠️ Potential issue | 🟠 Major | ⚡ Quick win请通过
comp.fail(...)返回失败,而不是直接抛异常。Line [279] 直接抛
OperationFailureException会绕过Completion回调语义;调用方若依赖回调收敛,可能出现流程提前中断或未按预期处理失败。💡 建议修改
`@Override` public void blacklist(String installPath, String protocol, HostInventory h, Completion comp) { - throw new OperationFailureException(operr("xinfini does not support volume path isolation yet, " + - "abort starting VM on host[uuid:%s, ip:%s] to prevent split-brain on volume[path:%s, protocol:%s]", - h.getUuid(), h.getManagementIp(), installPath, protocol)); + comp.fail(operr("xinfini does not support volume path isolation yet, " + + "abort starting VM on host[uuid:%s, ip:%s] to prevent split-brain on volume[path:%s, protocol:%s]", + h.getUuid(), h.getManagementIp(), installPath, protocol)); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/xinfini/src/main/java/org/zstack/xinfini/XInfiniStorageController.java` around lines 279 - 281, 在 XInfiniStorageController 中不要直接抛出 OperationFailureException(目前在那段代码用 throw new OperationFailureException(...)),改为通过已有的 Completion 对象返回失败:调用 comp.fail(operr("xinfini does not support volume path isolation yet, abort starting VM on host[uuid:%s, ip:%s] to prevent split-brain on volume[path:%s, protocol:%s]", h.getUuid(), h.getManagementIp(), installPath, protocol)); 确保使用与当前上下文中可用的 Completion 变量名(如 comp 或 completion),并移除原先的 throw 语句以保持回调语义一致。compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java-39-39 (1)
39-39:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
stopped仅保证可见性,无法避免 stop/track 的竞态当前是典型 check-then-act 竞态:
trackHost()可能在读取到stopped=false后,和stop()并发交错,最终在stop()之后仍启动新的Tracker。🔧 建议修复(统一生命周期锁)
public class HostTrackImpl implements HostTracker, ManagementNodeChangeListener, Component, ManagementNodeReadyExtensionPoint { @@ private Map<String, Tracker> trackers = new ConcurrentHashMap<>(); private volatile boolean stopped; + private final Object lifecycleLock = new Object(); @@ public void trackHost(String hostUuid) { - if (stopped) { - logger.debug(String.format("skip tracking host[uuid:%s], host tracker is stopped", hostUuid)); - return; - } + synchronized (lifecycleLock) { + if (stopped) { + logger.debug(String.format("skip tracking host[uuid:%s], host tracker is stopped", hostUuid)); + return; + } - if (!destMaker.isManagedByUs(hostUuid)) { - logger.debug(String.format("skip tracking host[uuid:%s], not managed by us", hostUuid)); - return; - } + if (!destMaker.isManagedByUs(hostUuid)) { + logger.debug(String.format("skip tracking host[uuid:%s], not managed by us", hostUuid)); + return; + } - Tracker t = trackers.get(hostUuid); - if (t != null) { - t.cancel(); - } + Tracker t = trackers.get(hostUuid); + if (t != null) { + t.cancel(); + } - t = new Tracker(hostUuid); - trackers.put(hostUuid, t); + t = new Tracker(hostUuid); + trackers.put(hostUuid, t); - if (CoreGlobalProperty.UNIT_TEST_ON && !alwaysStartRightNow) { - t.start(); - } else { - t.startRightNow(); - } + if (CoreGlobalProperty.UNIT_TEST_ON && !alwaysStartRightNow) { + t.start(); + } else { + t.startRightNow(); + } - logger.debug(String.format("starting tracking hosts[uuid:%s]", hostUuid)); + logger.debug(String.format("starting tracking hosts[uuid:%s]", hostUuid)); + } } @@ public boolean start() { - stopped = false; + synchronized (lifecycleLock) { + stopped = false; + } populateExtensions(); @@ public boolean stop() { - stopped = true; - new HashSet<>(trackers.values()).forEach(Tracker::cancel); - trackers.clear(); + synchronized (lifecycleLock) { + stopped = true; + new HashSet<>(trackers.values()).forEach(Tracker::cancel); + trackers.clear(); + } return true; }Also applies to: 255-258, 294-297, 367-367, 436-439
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java` at line 39, HostTrackImpl's volatile stopped causes a check-then-act race between trackHost() and stop() so new Trackers may start after stop(); fix by adding a dedicated lifecycle lock (e.g., a private final ReentrantLock lifecycleLock) and use it to guard all accesses to stopped and to the Tracker start/stop sequences: acquire the lock in trackHost(), check stopped and only create/start the Tracker while holding the lock; acquire the lock in stop(), set stopped and stop/cleanup any existing Tracker while holding the lock; remove reliance on volatile-only semantics for stopped and ensure all methods that inspect or mutate tracker lifecycle (trackHost(), stop(), any start/stop helper methods) use the same lifecycleLock to make the check-and-act atomic.network/src/main/java/org/zstack/network/l2/L2NetworkApiInterceptor.java-118-123 (1)
118-123:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
hasSdnControllerTag方法应使用标准的 tag 匹配 API 而非简单的字符串前缀匹配。当前实现通过
tag.startsWith("SdnControllerUuid::")判断系统标签,这种方式过于简陋且容易出错。实际的 SDN controller tag 格式由PatternedSystemTag定义为SdnControllerUuid::{SdnControllerUuid},应使用标准的isMatch()方法:private boolean hasSdnControllerTag(List<String> systemTags) { if (systemTags == null || systemTags.isEmpty()) { return false; } return systemTags.stream().anyMatch(tag -> L2NetworkSystemTags.L2_NETWORK_SDN_CONTROLLER_UUID.isMatch(tag)); }这样可以确保在 138-140 行的验证逻辑中准确判断是否存在 SDN controller 标签,避免错误地要求
physicalInterface。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@network/src/main/java/org/zstack/network/l2/L2NetworkApiInterceptor.java` around lines 118 - 123, Replace the naive prefix check in hasSdnControllerTag with the PatternedSystemTag matcher: locate the hasSdnControllerTag method and change its stream anyMatch predicate to call L2NetworkSystemTags.L2_NETWORK_SDN_CONTROLLER_UUID.isMatch(tag) instead of tag.startsWith(...), keeping the null/empty guard; this ensures the SDN controller tag is detected using the standard tag-matching API used elsewhere.conf/db/upgrade/V5.5.22__schema.sql-134-148 (1)
134-148:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
ZnsTenantRouterVO.sdnControllerUuid少了控制器级别的外键。
tenantUuid这里是可空的,但表里只有tenantUuid -> ZnsTenantVO的 FK。未绑定 tenant 的 router 在删除 controller 后不会被级联清理,会留下孤儿记录;而且sdnControllerUuid本身也失去了数据库层的完整性约束。建议补上sdnControllerUuid -> SdnControllerVO(uuid)的 FK。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@conf/db/upgrade/V5.5.22__schema.sql` around lines 134 - 148, The ZnsTenantRouterVO table is missing a foreign key on sdnControllerUuid causing orphaned rows and loss of DB integrity; add a FK constraint on ZnsTenantRouterVO.sdnControllerUuid referencing SdnControllerVO.uuid (e.g. CONSTRAINT fk_zns_tr_controller FOREIGN KEY (`sdnControllerUuid`) REFERENCES `SdnControllerVO`(`uuid`) ON DELETE CASCADE) so that router rows are cascaded/validated when controllers are removed and the sdnControllerUuid column is enforced at the DB level.compute/src/main/java/org/zstack/compute/vm/VmDetachNicFlow.java-98-116 (1)
98-116:⚠️ Potential issue | 🟠 Major | ⚡ Quick win不要在
ReturnIpMsg失败后继续删 NIC。
errorCodeList在这里被完全忽略了;只要有一个 IP 归还失败,后面仍会执行dbf.removeByPrimaryKey(...)。这样会把VmNicVO删掉,但失败的UsedIpVO还留在库里,后续既难重试也会直接泄漏地址。至少应在有归还失败时中断删除并把错误往上抛。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmDetachNicFlow.java` around lines 98 - 116, The current done(ErrorCodeList errorCodeList) in the WhileDoneCompletion ignores errorCodeList and always calls callReleaseSdnNics then deletes the VmNicVO, which can remove the NIC while IP release failed; change the logic in done(ErrorCodeList) to inspect errorCodeList (from WhileDoneCompletion) and if it contains any errors, do not call dbf.removeByPrimaryKey(nic.getUuid(), VmNicVO.class) or proceed with trigger.next — instead propagate an aggregated ErrorCode (or fail the flow) so the caller can handle retry/compensation; only when errorCodeList is empty/ok should you call callReleaseSdnNics and then remove the VmNicVO and trigger.next. Ensure you update references in done, callReleaseSdnNics callback handling, and any places expecting trigger.next on success so failures are surfaced.plugin/kvm/src/main/java/org/zstack/kvm/VmNicLifecycleKvmBridge.java-64-107 (1)
64-107:⚠️ Potential issue | 🟠 Major | ⚡ Quick win不要把不同的
VmNicLifecycleExtensionPoint并发执行。这里把并发度设成
extensions.size(),会让多个扩展同时对同一台 host 的 NIC 集合做reconcileOnHost。扩展点契约里看不到并发安全保证,这种并发很容易引入竞态、重复回收或互相超时。更稳妥的是串行执行(each或并发度 1)。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/kvm/src/main/java/org/zstack/kvm/VmNicLifecycleKvmBridge.java` around lines 64 - 107, The loop currently runs with concurrency extensions.size(), allowing multiple VmNicLifecycleExtensionPoint implementations in extensions to call reconcileOnHost concurrently which can cause race conditions; change the While<> invocation to run serially (concurrency = 1) or use the serial helper (each) so that extensions are processed one-by-one; keep the existing timeout handling (thdf.submitTimeoutTask and ThreadFacadeImpl.TimeoutTaskReceipt), completed AtomicBoolean and the try/catch around ext.reconcileOnHost, but replace the extensions.size() concurrency argument with 1 (or switch to the serial iterator variant) to ensure no two extensions run reconcileOnHost for the same host simultaneously.conf/db/upgrade/V5.5.22__schema.sql-87-148 (1)
87-148:⚠️ Potential issue | 🟠 Major | ⚡ Quick win新增 ZNS 表和外键需要显式带上
zstackschema。这一段新建表和 FK 引用都改成了未限定 schema 的名字,例如
CREATE TABLE IF NOT EXISTS \ZnsControllerVO`和REFERENCES `SdnControllerVO`。升级脚本在这个目录里默认约定是显式使用zstack....``;否则默认库不是zstack时,表可能建到错误 schema,外键解析也会变得不稳定。 Based on learnings: In ZStack, the database schema name is fixed as 'zstack'. For SQL upgrade scripts under conf/db/upgrade, continue using TABLE_SCHEMA = 'zstack' and reference tables as zstack.`UsedIpVO`.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@conf/db/upgrade/V5.5.22__schema.sql` around lines 87 - 148, The new ZNS tables and their foreign keys are created without the explicit zstack schema; update all CREATE TABLE and REFERENCES targets to qualify the schema as zstack (e.g., zstack.`ZnsControllerVO`, zstack.`ZnsTransportZoneVO`, zstack.`ZnsTenantVO`, zstack.`ZnsTenantRouterVO`) and change FK references to zstack.`SdnControllerVO`, zstack.`L2NetworkEO`, and zstack.`ZnsTenantVO` accordingly (or ensure TABLE_SCHEMA = 'zstack' is used consistently in this migration), so the tables and CONSTRAINTs are created in the zstack schema and foreign-key resolution remains stable.network/src/main/java/org/zstack/network/l2/L2NoVlanNetwork.java-972-975 (1)
972-975:⚠️ Potential issue | 🟠 Major | ⚡ Quick win这里的 ZNS 特判是单向的。
现在只在待挂载的
tl2是 ZNS 时直接return。如果集群里已经有 ZNS 的L2NoVlanNetwork,它仍会留在l2s里参与后面的physicalInterface冲突判断,从而错误拦截普通L2NoVlanNetwork的挂载。这里应像下面的 VLAN 分支一样在查询阶段排除vSwitchType = ZNS,或在循环里跳过这些记录。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@network/src/main/java/org/zstack/network/l2/L2NoVlanNetwork.java` around lines 972 - 975, The current ZNS special-case is one-way: you return early when the to-be-attached tl2 has vSwitchType==L2NetworkConstant.VSWITCH_TYPE_ZNS, but existing L2NoVlanNetwork records in l2s with vSwitchType ZNS are still considered in the physicalInterface conflict checks and can block non-ZNS attachments. Fix by excluding ZNS networks symmetrically—either filter out entries with vSwitchType==L2NetworkConstant.VSWITCH_TYPE_ZNS when building/querying l2s (same approach as the VLAN branch) or, inside the loop that iterates l2s during the physicalInterface conflict check, explicitly skip records whose getvSwitchType() equals L2NetworkConstant.VSWITCH_TYPE_ZNS so ZNS networks don’t participate in conflict detection for non-ZNS attachments (refer to tl2, l2s, getvSwitchType(), physicalInterface check).conf/db/upgrade/V5.5.16__schema.sql-357-358 (1)
357-358:⚠️ Potential issue | 🟠 Major | ⚡ Quick win这里不要再引入
0000-00-00 00:00:00作为TIMESTAMP默认值。
VmModelMountVO.createDate现在用的是零时间。这个值在严格 SQL mode /NO_ZERO_DATE下会直接导致升级失败,而且本仓库最近的迁移已经统一改成非零哨兵值来兼容双TIMESTAMP场景。建议改成'2000-01-01 00:00:00'。 Based on learnings: In ZStack MySQL upgrade scripts under conf/db/upgrade/*.sql, if a table already contains a TIMESTAMP column defined with ON UPDATE CURRENT_TIMESTAMP, then any second TIMESTAMP column must use a non-CURRENT_TIMESTAMP sentinel default such as '2000-01-01 00:00:00'.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@conf/db/upgrade/V5.5.16__schema.sql` around lines 357 - 358, The TIMESTAMP default for VmModelMountVO.createDate currently uses '0000-00-00 00:00:00', which breaks strict SQL modes and multi-TIMESTAMP tables; update the createDate column definition (the TIMESTAMP that is not the ON UPDATE CURRENT_TIMESTAMP one) to use a non-zero sentinel default such as '2000-01-01 00:00:00' instead of '0000-00-00 00:00:00' so it is compatible with the existing lastOpDate (CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP) column and strict MySQL modes.plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java-15-25 (1)
15-25:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
resolveStatus对event == null没有保护,会触发 NPE。
switch (event)在event为空时会直接异常,建议先做空值兜底并返回vo.getStatus()。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java` around lines 15 - 25, The resolveStatus method in SdnControllerFactory should guard against a null event to avoid an NPE: at the start of resolveStatus(SdnControllerStatusEvent event, SdnControllerVO vo) check if event is null and immediately return vo.getStatus(); then proceed with the existing switch on event. This ensures resolveStatus always returns the current status when event is missing without changing the existing case handling.plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java-1494-1497 (1)
1494-1497:⚠️ Potential issue | 🟠 Major | ⚡ Quick win这里存在空指针风险:
healthTarget可能为null。
healthTarget.split(":")在缺失历史 tag 或脏数据场景下会直接抛 NPE,建议先判空并给出一致的参数错误。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java` around lines 1494 - 1497, The code in LoadBalancerApiInterceptor uses healthTarget.split(":") without guarding against healthTarget being null, which can cause an NPE; before calling split (inside the block where msg.getHealthCheckProtocol() equals LoadBalancerConstants.HEALTH_CHECK_TARGET_PROTOCL_HTTP), validate that healthTarget is not null or empty and if it is throw the same OperationFailureException using argerr(ORG_ZSTACK_NETWORK_SERVICE_LB_10119, ...) to return a consistent parameter error; update the check around healthTarget (the variable used to build ts) to perform a null/empty check first, then split and validate ts.length == 2 as currently implemented.compute/src/main/java/org/zstack/compute/vm/VmAllocateNicFlow.java-124-157 (1)
124-157:⚠️ Potential issue | 🟠 Major | ⚡ Quick win失败分支不能绕过
VmInstanceNicFactory直接删库。
nicVO是通过vnicFactory.createVmNic()创建的,但扩展失败时这里只删除了VmNicVO主表。若具体工厂还创建了类型相关记录或标签,这里会留下脏数据;而 Line 326-328 的 rollback 已经表明正确的释放路径是vnicFactory.releaseVmNic(...)。建议这里也走工厂释放,而不是手工删主表。建议修改
`@Override` public void fail(ErrorCode errorCode) { try { - dbf.removeByPrimaryKey(nicVO.getUuid(), VmNicVO.class); + vnicFactory.releaseVmNic(nic); } catch (Throwable t) { logger.warn(String.format("failed to remove VmNicVO[uuid:%s] after before allocate extension failure", nicVO.getUuid()), t); } errs.add(errorCode);As per coding guidelines "策略与机制分离:接口定义一个消息处理机制,而具体业务由不同实现类完成,从而减少耦合性,提高代码可读性。"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmAllocateNicFlow.java` around lines 124 - 157, The failure branch currently removes only the VmNicVO row created by vnicFactory.createVmNic(nic, spec) using dbf.removeByPrimaryKey(nicVO.getUuid(), VmNicVO.class), which can leave factory-specific records/tags orphaned; replace that manual delete with the factory’s release path by calling vnicFactory.releaseVmNic(nicVO.getUuid(), spec) (or the appropriate releaseVmNic method signature used at lines 326-328) inside the fail() block, and remove the dbf.removeByPrimaryKey call and its try/catch so the factory cleans up all related resources consistently (keep errs.add(errorCode) and wcomp.allDone()).plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java-5455-5460 (1)
5455-5460:⚠️ Potential issue | 🟠 Major | ⚡ Quick win创建新数据卷后未采用 agent 返回路径,可能回传错误 installPath
CreateEmptyVolumeRsp的installPath在这里被忽略了;如果后端返回了规范化路径,当前逻辑会继续使用请求路径,导致后续返回/清理路径不一致。🔧 建议修复
@@ httpCall(CREATE_VOLUME_PATH, cmd, CreateEmptyVolumeRsp.class, new ReturnValueCompletion<CreateEmptyVolumeRsp>(trigger) { `@Override` public void success(CreateEmptyVolumeRsp ret) { newVolumeCreated = true; + newVolumePath = buildEmptyVolumeInstallPath(targetPoolName, cmd.installPath, ret.getInstallPath()); trigger.next(); }Also applies to: 5505-5506
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java` around lines 5455 - 5460, The success callback for the httpCall to CREATE_VOLUME_PATH (ReturnValueCompletion<CreateEmptyVolumeRsp> in CephPrimaryStorageBase) ignores the agent-returned CreateEmptyVolumeRsp.installPath and continues using the original request path, which can cause mismatched install/cleanup paths; update the success handler to check ret.getInstallPath() (or ret.installPath) and, if non-empty, replace the local installPath/volumePath variable with this normalized value before setting newVolumeCreated and calling trigger.next(); apply the same change to the analogous handler around the other occurrence referenced (lines ~5505-5506) so all code paths use the agent-provided install path for subsequent responses and cleanup.storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java-1037-1043 (1)
1037-1043:⚠️ Potential issue | 🟠 Major | ⚡ Quick win用同步后的
sp状态发 canonical event
syncVolumeSnapshotDBAfterTakeSnapshot()已经把库里的快照状态统一成最终值并返回了sp,这里仍然读取snapshot.getStatus()会让事件状态和最终 inventory 脱节;如果后端 reply 没带状态或还是旧值,valueOf(...)这里也会直接炸。💡 建议修改
new FireSnapShotCanonicalEvent(). fireSnapShotStatusChangedEvent( - VolumeSnapshotStatus.valueOf(snapshot.getStatus()), + VolumeSnapshotStatus.valueOf(sp.getStatus()), sp);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java` around lines 1037 - 1043, The canonical event is using snapshot.getStatus() which may be stale or null; use the synchronized inventory returned by syncVolumeSnapshotDBAfterTakeSnapshot (sp) when firing the event instead. Replace VolumeSnapshotStatus.valueOf(snapshot.getStatus()) with a safe conversion from sp.getStatus() (e.g. VolumeSnapshotStatus.valueOf(sp.getStatus()) or a null/invalid-safe mapper) when calling new FireSnapShotCanonicalEvent().fireSnapShotStatusChangedEvent(..., sp) so the event status always matches the persisted inventory and avoid NPE/IllegalArgumentException for missing/invalid backend status.core/src/main/java/org/zstack/core/rest/webhook/WebhookCallbackClient.java-101-109 (1)
101-109:⚠️ Potential issue | 🟠 Major | ⚡ Quick win先登记
pendingCalls,再启动超时任务。Lines 104-109 现在是先注册 timeout、后
put。当timeout为 0 或非常小时,超时回调可能先执行到fail(taskId, ...),因为 map 里还没有 entry,这次失败会被直接吞掉;随后插入的 entry 就会永久残留,并且对应 completion 永远不会结束。这是一个典型的 TOCTOU 并发 bug。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/src/main/java/org/zstack/core/rest/webhook/WebhookCallbackClient.java` around lines 101 - 109, The submit method has a TOCTOU race: it submits the timeout task (thdf.submitTimeoutTask) before inserting the PendingEntry into pendingCalls so a very fast timeout can call fail(taskId, ...) while the map has no entry, leaving a leaked pendingCalls entry; fix by creating the PendingEntry (new PendingEntry<>(completion, /*receipt placeholder*/)) and putting it into pendingCalls first, then call thdf.submitTimeoutTask to obtain the TimeoutTaskReceipt and set that receipt into the existing PendingEntry (or replace the map entry), ensuring the timeout handler always sees the entry; update references in submit, PendingEntry, pendingCalls, thdf.submitTimeoutTask and fail accordingly.storage/src/main/java/org/zstack/storage/volume/VolumeBase.java-315-317 (1)
315-317:⚠️ Potential issue | 🟠 Major | ⚡ Quick win在切换
installPath入库之后,不要再把后续同步失败升级成整个 reinit 失败。Lines 315-317 和 Lines 515-517 已经把新安装路径持久化到数据库了,但 Lines 340-343 和 Lines 540-543 失败时仍然走整体 error 分支。这样调用方看到的是“重置失败”,实际卷却已经切到了新路径且不会回滚,后续重试会在已生效的新状态上再次执行。这里更安全的做法是把后续大小同步降级为 best-effort,或者把所有必须成功的步骤都放在
installPath切换之前。Also applies to: 340-343, 515-517, 540-543
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@storage/src/main/java/org/zstack/storage/volume/VolumeBase.java` around lines 315 - 317, The code in VolumeBase that sets the new install path (vo.setInstallPath(...) followed by vo = dbf.updateAndRefresh(vo) and dbInstallPathSwitched = true) persists the new path but still treats subsequent size-sync failures as a full reinit failure; change the error handling after the installPath has been persisted so subsequent syncs are best-effort: detect dbInstallPathSwitched == true and on failures from the size synchronization steps (the branches that currently go to the overall error path) log a warning and continue/return success rather than escalating to reinit failure, or alternatively move all mandatory steps before calling vo.setInstallPath/dbf.updateAndRefresh so persistence only happens after all must-succeed actions; update the branches that currently transition to the error path (the ones after vo.setInstallPath/dbf.updateAndRefresh and the corresponding later occurrences) to implement the best-effort downgrade.storage/src/main/java/org/zstack/storage/volume/VolumeBase.java-554-558 (1)
554-558:⚠️ Potential issue | 🟠 Major | ⚡ Quick win不要对数据卷 reinit 复用
AfterReimageVmInstanceExtensionPoint。Lines 554-558 在数据卷重置成功后触发的是“VM reimage”扩展点。这个钩子的实现很容易默认自己收到的是根卷/VM 级事件;一旦扩展抛错,这个 API 会在数据卷已经重置成功后反向报错。这里应移除该回调,或者引入专门的数据卷 reinit 扩展点。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@storage/src/main/java/org/zstack/storage/volume/VolumeBase.java` around lines 554 - 558, The code currently invokes AfterReimageVmInstanceExtensionPoint for data-volume reinit (the loop calling ext.afterReimageVmInstance(dataVolumeInventory)), which is incorrect; remove that invocation and either drop the callback entirely for data-volume reinit or introduce and call a dedicated extension point (e.g., DataVolumeReinitExtensionPoint with a method like afterDataVolumeReinit(DataVolumeInventory)) instead; locate the loop over pluginRgty.getExtensionList(AfterReimageVmInstanceExtensionPoint.class) and delete the ext.afterReimageVmInstance(dataVolumeInventory) calls (or replace the extension lookup/call with the new DataVolumeReinitExtensionPoint and its method).core/src/main/java/org/zstack/core/rest/webhook/WebhookCallbackClient.java-81-87 (1)
81-87:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
start()不要覆盖预先设置的callbackUrl。Lines 139-140 已经提供了显式覆写入口,但 Lines 81-87 每次启动都会把
callbackUrl重置成restf.getSendCommandUrl()。如果调用方在start()前按专用回调端点场景设置了自定义 URL,发给外部系统的回调地址会被静默改错。💡 建议修改
public void start() { - this.callbackUrl = restf.getSendCommandUrl(); + if (this.callbackUrl == null) { + this.callbackUrl = restf.getSendCommandUrl(); + } restf.registerSyncHttpCallHandler( protocol.getCallbackPath(), protocol.getCallbackClass(), this::onCallback); }Also applies to: 139-140
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/src/main/java/org/zstack/core/rest/webhook/WebhookCallbackClient.java` around lines 81 - 87, start() currently unconditionally overwrites this.callbackUrl with restf.getSendCommandUrl(); instead, preserve any pre-set custom callbackUrl and only default it when not set—e.g., change the assignment in start() to set this.callbackUrl = restf.getSendCommandUrl() only if this.callbackUrl is null/empty (leave custom values intact); keep the restf.registerSyncHttpCallHandler(protocol.getCallbackPath(), protocol.getCallbackClass(), this::onCallback) call unchanged so explicit overrides (the existing setter/override flow) still function.plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java-273-279 (1)
273-279:⚠️ Potential issue | 🟠 Major | ⚡ Quick win不要把 tag 持久化失败降级成成功响应。
这里把
createTagsFromAPICreateMessage()和 inventory 回填包在同一个try/catch里后,tag 写入失败也只会打warn,最终仍然发布 success event。这样 API 调用方会收到“创建成功”,但请求里的systemTags可能根本没落库,行为和请求已经不一致了。建议只把 inventory 组装当成 best-effort;tag 创建失败至少要把 event 标成失败,必要时再补清理逻辑。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java` around lines 273 - 279, Separate tag persistence from inventory assembly: do not swallow failures from tagMgr.createTagsFromAPICreateMessage into a success event. Call tagMgr.createTagsFromAPICreateMessage(vo.getUuid(), ...) in its own try/catch; if it throws, mark the API event (the variable event) as failed (populate the event error / setSuccess(false) and publish/return the failure) instead of continuing to publish a success. Treat event.setInventory(...) / reading the DB as best-effort in a separate try/catch so inventory backfill doesn't override tag creation errors; only proceed to publish a success event when tag creation succeeded.compute/src/main/java/org/zstack/compute/vm/VmNicLifecycleManager.java-342-353 (1)
342-353:⚠️ Potential issue | 🟠 Major在 release 路径中,
doCleanup()应该使用当前宿主而不是目标宿主。
doCleanup()目前通过spec.getDestHost()获取清理宿主,但在 stop/destroy 等释放资源的操作中,目标宿主(destHost)为 null。这导致所有 NIC 清理逻辑被静默跳过。相比之下,attach/detach 路径使用
spec.getVmInventory().getHostUuid()直接获取当前宿主。对于释放资源操作,应该清理 VM 当前所在的宿主,而非一个不存在的目标宿主。建议改为:
String hostUuid = spec.getVmInventory().getHostUuid(); if (hostUuid == null) { completion.done(); return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmNicLifecycleManager.java` around lines 342 - 353, The doCleanup method currently uses spec.getDestHost() to determine hostUuid which is null for release paths; change it to use the VM's current host via spec.getVmInventory().getHostUuid() and null-check that value before proceeding (i.e., replace the spec.getDestHost() check and hostUuid assignment with retrieval of hostUuid from spec.getVmInventory(), call completion.done() if it's null, then call runCleanup with the existing parameters); update references in VmNicLifecycleManager.doCleanup accordingly.plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java-3062-3063 (1)
3062-3063:⚠️ Potential issue | 🟠 Major | ⚡ Quick win把快照 DB 同步扩展的异常转成受控失败。
Line 3062 在 agent 已经成功创建快照后,直接执行 Lines 3115-3117 的 DB 同步扩展。这里如果任一扩展抛出运行时异常,异常会从异步 REST 回调直接冒出去,当前 flow 拿不到
trigger.fail(...),很容易留下 hypervisor 已成功、DB 未同步的悬挂快照。这里至少需要 catch 扩展异常并显式 fail 当前 flow,避免继续走成功回包。Also applies to: 3115-3117
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3062 - 3063, After the hypervisor reports snapshot creation success, the extension callbacks (calls to syncSnapshotMetadataAfterHypervisorSuccess(...) and extEmitter.afterTakeSnapshot((KVMHostInventory) getSelfInventory(), msg, cmd, ret)) can throw runtime exceptions that currently escape the async callback and leave a hypervisor-created snapshot with DB unsynced; wrap each extension invocation in a try/catch that catches Throwable, convert the error into a controlled flow failure by calling trigger.fail(...) with a clear error message and the caught exception, and return/stop further success processing so the flow does not send a success reply; apply the same try/catch+trigger.fail pattern to both the syncSnapshotMetadataAfterHypervisorSuccess call and the extEmitter.afterTakeSnapshot call (including the place at the later invocation) so all extension exceptions are handled deterministically.plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java-5249-5258 (1)
5249-5258:⚠️ Potential issue | 🟠 Major | ⚡ Quick win不要直接用 agent 的绝对时间去裁剪 MN 侧 pending HTTP 调用。
Line 5250 的
agentStartTimeMillis来自宿主机,而这里清理的是管理节点本地记录的 pending call。两边时钟只要有偏差,就会误杀重连后刚发出的新请求,或者完全漏掉真正的陈旧请求。更稳妥的做法是用 MN 本地记录的 connect/reconnect 起始时间做边界,或者改成基于 boot-id / 相对启动时长判断。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 5249 - 5258, 不要用远端 agent 的绝对时间 rsp.getAgentStartTimeMillis() 去裁剪管理节点的 pending 调用;改成用管理节点本地记录的该连接的 connect/reconnect 起始时间或基于 boot-id/相对启动时长来算边界,然后将该本地时间传给 restf.failPendingCallsForResourceBefore(替换掉当前对 rsp.getAgentStartTimeMillis() 的依赖)并在日志里打印该本地时间或 boot-id/相对时长以便排查;具体修改点:在处理 rsp.isFirstConnect() 的分支中,不再用 rsp.getAgentStartTimeMillis() 计算 cutoffMillis,而改为读取管理节点保存的连接开始时间(或 boot-id/连接序号)并用该值调用 restf.failPendingCallsForResourceBefore(self.getUuid(), cutoffMillis, ...),同时调整 logger.info 的参数以反映所使用的本地边界值或 boot-id。compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java-922-926 (1)
922-926:⚠️ Potential issue | 🟠 Major添加
purpose参数到APIGetVmStartingCandidateClustersHostsMsg并在 handler 中转发。
APIGetVmStartingCandidateClustersHostsMsg目前缺少purpose字段。在handle()方法(第 840 行)只转发了uuid,导致 API 调用方无法指定列表候选主机的 purpose 为LIST_CANDIDATES。这样经由 API 发起的候选主机查询在下游到达 line 922-926 时,GetVmStartingCandidateClustersHostsMsg的 purpose 始终是默认的ALLOCATE,无法告诉下游过滤器这是一个候选列表操作而非真实分配。需要:
- 在
APIGetVmStartingCandidateClustersHostsMsg添加purpose字段(@APIParam)- 在
handle()方法中将 API 消息的 purpose 转发到内部GetVmStartingCandidateClustersHostsMsg🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 922 - 926, Add a new purpose field to the API message and forward it into the internal message: update APIGetVmStartingCandidateClustersHostsMsg to include a purpose property annotated with `@APIParam` (matching the enum type used by GetVmStartingCandidateClustersHostsMsg), and in the handler method that builds and sends the GetVmStartingCandidateClustersHostsMsg (the handle() that currently only forwards uuid) set amsg.setPurpose(apiMsg.getPurpose()) so the internal GetVmStartingCandidateClustersHostsMsg receives the purpose (allowing LIST_CANDIDATES to be propagated).
🟡 Minor comments (3)
plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java-790-804 (1)
790-804:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win健康检查协议/目标解析前建议统一
trim()入参。当前直接解析和比较字符串,用户从 UI 粘贴带空格/换行时会导致误判(冲突、协议不支持、target 非法)。建议在读取
msg与 systemTag token 后先做trim标准化。As per coding guidelines:
**/*Interceptor.java:注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等Also applies to: 1467-1478
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java` around lines 790 - 804, The health-check target/protocol parsing uses raw tokens from getSystemTagTokens and may mis-handle user input with leading/trailing whitespace; trim the token strings immediately after retrieving them (e.g., normalize entries in healthCheckTargets and the derived healthCheckTarget) so subsequent calls like isNoneHealthCheckTargetWithSpecificPort and getHealthCheckProtocolFromTarget operate on trimmed values; apply the same trimming normalization wherever similar parsing occurs (see the other affected block referenced around lines 1467-1478) and ensure comparisons against LoadBalancerSystemTags use trimmed strings.test/src/test/java/org/zstack/test/kvm/KVMHostUtilsTest.java-141-143 (1)
141-143:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win断言方式会产生子串误判。
这里用
deployIps.contains(ip)校验可能把10.0.0.7误判为命中10.0.0.70。建议按逗号拆分后做精确匹配。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/java/org/zstack/test/kvm/KVMHostUtilsTest.java` around lines 141 - 143, The test uses substring matching which can false-positive (e.g., "10.0.0.7" matching "10.0.0.70"); in the loop over detected.split(",") replace the loose check using deployIps.contains(ip) with an exact-match check by parsing deployIps into discrete entries (e.g., split deployIps by "," into a Set or list) and assert that the resulting collection contains the exact ip string; update the assertion in KVMHostUtilsTest where variables detected and deployIps are used to perform precise membership checks rather than substring contains.docs/design/zns-pivot/ZCF-2133-zns-pivot-detailed-design.md-1329-1329 (1)
1329-1329:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win补全这些代码块的语言标记,并检查 Line 1329 是否是误留的围栏。
这些 fence 现在会持续触发 markdownlint;其中 Line 1329 还是一个孤立的
,很容易把后面的章节错误地包进代码块。时序图/ASCII 图统一改成 `text `,如果 Line 1329 不是有意开启代码块,建议直接删掉。Also applies to: 1952-1952, 1975-1975, 1996-1996
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design/zns-pivot/ZCF-2133-zns-pivot-detailed-design.md` at line 1329, There are unmarked/incorrect Markdown fences causing markdownlint failures—locate the isolated triple-backtick at the vicinity of the orphan fence (the lone "```" around Line 1329) and either remove it if it was left accidentally or close it properly; for all ASCII/timing/sequence diagrams and other non-code blocks (including the fences near the other reported locations around Lines 1952, 1975, 1996) change their opening fences from ``` to ```text (or add an appropriate language tag) so the blocks are explicitly marked and no subsequent sections are accidentally captured as code.
| vo.setType(msg.getType() != null ? msg.getType() : L3NetworkConstant.L3_BASIC_NETWORK_TYPE); | ||
|
|
There was a problem hiding this comment.
默认类型回退后,工厂选择仍使用 msg.getType() 会触发空指针
Line 556 已允许 msg.getType() 为空并回退默认值,但后续创建流程仍在 Line 590 使用 L3NetworkType.valueOf(msg.getType())。当请求未传 type 时,这里会直接失败,导致创建 L3 网络失败。
建议修复
- L3NetworkFactory factory = getL3NetworkFactory(L3NetworkType.valueOf(msg.getType()));
+ L3NetworkFactory factory = getL3NetworkFactory(L3NetworkType.valueOf(vo.getType()));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@network/src/main/java/org/zstack/network/l3/L3NetworkManagerImpl.java` around
lines 556 - 557, 在 L3NetworkManagerImpl 创建流程中虽然在 vo.setType(...) 已将
msg.getType() 为空时回退为 L3NetworkConstant.L3_BASIC_NETWORK_TYPE,但后续使用
L3NetworkType.valueOf(msg.getType()) 会在 msg.getType() 为 null 时抛
NPE;请修改创建类型的来源为已经解析后的值(例如使用 vo.getType() 或在本地变量中保存 resolvedType = msg.getType()
!= null ? msg.getType() : L3NetworkConstant.L3_BASIC_NETWORK_TYPE),并将所有对
L3NetworkType.valueOf(...) 的调用改为传入该 resolvedType,以确保在 msg.getType() 为 null
时也能正确创建网络(参考符号:vo.setType(...)、L3NetworkType.valueOf(msg.getType())、L3NetworkManagerImpl)。
| VolumeSnapshotInventory snapshot = new VolumeSnapshotInventory(); | ||
| snapshot.setUuid(msg.getSnapshotName()); | ||
| snapshot.setVolumeUuid(volume.getUuid()); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== TakeSnapshotOnHypervisorMsg =="
fd 'TakeSnapshotOnHypervisorMsg\.java$' -x sh -c 'echo "----- $1 -----"; sed -n "1,220p" "$1"' sh {}
echo
echo "== VolumeSnapshotDBSyncExtensionPoint =="
fd 'VolumeSnapshotDBSyncExtensionPoint\.java$' -x sh -c 'echo "----- $1 -----"; sed -n "1,220p" "$1"' sh {}
echo
echo "== Related usages =="
rg -n -C2 'getSnapshotName\(|getSnapshotUuid\(|syncVolumeSnapshotDBAfterTakeSnapshot|setUuid\(' --type=javaRepository: MatheMatrix/zstack
Length of output: 50376
🏁 Script executed:
# 查看 KVMHost.java 3105-3107 行的完整上下文
sed -n '3095,3120p' plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
# 查找快照 UUID 的生成或赋值
rg -n 'VolumeSnapshotInventory.*uuid|snapshot.*setUuid|snapshotUuid.*=' plugin/kvm -A2 -B2 | head -50
# 查找对 syncVolumeSnapshotDBAfterTakeSnapshot 的实现
fd '.*\.java$' -x grep -l 'syncVolumeSnapshotDBAfterTakeSnapshot' | head -10Repository: MatheMatrix/zstack
Length of output: 2034
🏁 Script executed:
# 检查 VolumeSnapshotManagerImpl 中的相关实现
sed -n '1,100p' ./storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java | head -50
# 搜索快照 UUID 生成的模式
rg -n 'UUID\.randomUUID|new UUID|setUuid.*UUID' plugin/kvm -A2 -B2 | grep -i snapshot | head -30
# 查找 VolumeSnapshotManagerImpl 中 syncVolumeSnapshotDBAfterTakeSnapshot 的实现
rg -n 'syncVolumeSnapshotDBAfterTakeSnapshot' ./storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java -A20
# 检查其他快照代码中 UUID 的设置方式
rg -n 'new VolumeSnapshotInventory\(\)|snapshot\.setUuid' plugin/kvm -B3 -A3 | head -60Repository: MatheMatrix/zstack
Length of output: 6917
🏁 Script executed:
# 检查是否在调用 takeSnapshot 前创建了 VolumeSnapshotVO
rg -n 'syncSnapshotMetadataAfterHypervisorSuccess' plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java -B50 | grep -E 'new VolumeSnapshotVO|create.*snapshot|VolumeSnapshot.*save|persist' | head -20
# 查找 takeSnapshot 的完整调用链
rg -n 'syncSnapshotMetadataAfterHypervisorSuccess|TakeSnapshotOnHypervisorMsg' plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java -B5 | head -80
# 查看 VolumeSnapshotConstant 确认快照类型定义
rg -n 'HYPERVISOR_SNAPSHOT_TYPE|class VolumeSnapshotConstant' header/src/main/java/org/zstack/header/storage/snapshot/ -A5 | head -40Repository: MatheMatrix/zstack
Length of output: 2846
🏁 Script executed:
# 查找创建 TakeSnapshotOnHypervisorMsg 的地方
rg -n 'new TakeSnapshotOnHypervisorMsg\|TakeSnapshotOnHypervisorMsg.*=' --type=java -B5 -A8 | head -100
# 查找 VolumeSnapshotVO 的创建位置(应该在 takeSnapshot 前)
rg -n 'new VolumeSnapshotVO\|dbf.persistAndRefresh.*VolumeSnapshotVO' storage/src/main/java/org/zstack/storage/snapshot/ -B3 -A3 | head -80
# 查找 takeSnapshot 方法的调用方
rg -n 'bus.send.*TakeSnapshotOnHypervisorMsg' storage/src/main/java/org/zstack/storage/snapshot/ -B10 -A5 | head -100Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# 查找在 VolumeSnapshotManagerImpl 中如何调用 TakeSnapshotOnHypervisorMsg
rg -n 'TakeSnapshotOnHypervisorMsg' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java -B10 -A10
# 查找快照创建流程中 VolumeSnapshotVO 的插入
rg -n 'createSnapshot|takeSnapshot' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java -A20 | head -120
# 查看 snapshot.getUuid() 在 syncVolumeSnapshotDBAfterTakeSnapshot 之前是如何被赋值的
sed -n '700,750p' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.javaRepository: MatheMatrix/zstack
Length of output: 2342
🏁 Script executed:
# 查找调用 TakeSnapshotOnHypervisorMsg 的完整流程
rg -n 'new TakeSnapshotOnHypervisorMsg\|bus.send.*TakeSnapshot' storage/src/main/java/org/zstack/storage/snapshot/ -B15 -A5 | head -150
# 查看 CreateVolumeSnapshotMsg 的处理流程
rg -n 'handle.*CreateVolumeSnapshotMsg' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java -A50 | head -100
# 查找快照记录在何处创建
rg -n 'VolumeSnapshotVO vo = ' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java -B5 -A10 | head -80Repository: MatheMatrix/zstack
Length of output: 8686
🏁 Script executed:
# 查找创建快照时的完整链条,特别是如何关联 snapshotName 和 UUID
sed -n '810,920p' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java
# 查找 TakeSnapshotOnHypervisorMsg 的构造方式
rg -n 'setSnapshotName|TakeSnapshotOnHypervisorMsg' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java -B3 -A3 | head -100Repository: MatheMatrix/zstack
Length of output: 5596
🏁 Script executed:
# 继续查看创建快照流程中 TakeSnapshotOnHypervisorMsg 的创建部分
sed -n '900,1000p' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java
# 查找 snapshotName 在 TakeSnapshotOnHypervisorMsg 中如何被赋值
rg -n 'msg.setSnapshotName|getSnapshotName' storage/src/main/java/org/zstack/storage/snapshot/ -B5 -A2Repository: MatheMatrix/zstack
Length of output: 5324
🏁 Script executed:
# 查看整个创建快照流程中 VolumeSnapshotVO 是如何与 snapshotName 关联的
sed -n '770,810p' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java
# 查看 TakeSnapshotMsg 处理中是否保留了原始快照 UUID
rg -n 'class TakeSnapshotMsg|setSnapshot' header/src/main/java/org/zstack/header/storage/primary/ -B2 -A10 | head -80Repository: MatheMatrix/zstack
Length of output: 11036
🏁 Script executed:
# 查看 VolumeSnapshotStruct 中是否包含快照 UUID
rg -n 'class VolumeSnapshotStruct|getSnapshotUuid|getCurrentSnapshot' header/src/main/java/org/zstack/header/storage/snapshot/ -A5 | head -80
# 查找在 KVMHost 中如何获取真实的快照 UUID
rg -n 'TakeSnapshotOnHypervisorMsg' plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java -B20 | grep -E 'snapshot|Snapshot' | head -30
# 检查是否存在其他地方设置了真实的 snapshotUuid
rg -n 'snapshotUuid' plugin/kvm/src/main/java/org/zstack/kvm/ -B2 -A2 | head -40Repository: MatheMatrix/zstack
Length of output: 9224
🏁 Script executed:
# 查看 TakeSnapshotOnHypervisorMsg 中是否有 VolumeSnapshotStruct 信息
cat header/src/main/java/org/zstack/header/host/TakeSnapshotOnHypervisorMsg.java
# 查看在 VolumeSnapshotManagerImpl 中如何将 snapshot 信息传递给 TakeSnapshotOnHypervisorMsg
rg -n 'struct.getCurrent|setSnapshotName' storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java -B2 -A2Repository: MatheMatrix/zstack
Length of output: 1955
🏁 Script executed:
# 查找在什么地方调用 setSnapshotName 以及使用什么值
rg -n 'setSnapshotName' plugin/kvm storage/ header/ --type=java -B3 -A3 | head -80
# 查看 TakeSnapshotOnHypervisorMsg 是在哪里创建和配置的
rg -n 'new TakeSnapshotOnHypervisorMsg\|TakeSnapshotOnHypervisorMsg msg = ' plugin/kvm -B5 -A15 | head -100Repository: MatheMatrix/zstack
Length of output: 762
应将 snapshot.uuid 设置为真实的快照 UUID,而不是 msg.getSnapshotName()。
第 3106 行将 msg.getSnapshotName() 写入了 snapshot.uuid。由于 TakeSnapshotOnHypervisorMsg 仅包含超程序返回的快照名称(非 UUID),而 syncVolumeSnapshotDBAfterTakeSnapshot() 会使用该 uuid 字段作为主键进行数据库查询,快照名称无法匹配数据库中预先创建的快照记录。这导致数据库同步失败,返回 null,造成快照元数据同步完全失效,数据库记录与超程序状态不一致。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3105 -
3107, The code is incorrectly assigning msg.getSnapshotName() to
VolumeSnapshotInventory.uuid; replace that with the actual snapshot UUID from
the pre-created DB record (e.g. the variable or object holding the snapshot's
real UUID or preCreatedSnapshot.getUuid()) so
VolumeSnapshotInventory.setUuid(...) contains the true UUID (not the
hypervisor-returned name), and ensure the same UUID is passed to
syncVolumeSnapshotDBAfterTakeSnapshot() and any subsequent DB lookups; locate
the assignment in the block creating VolumeSnapshotInventory and swap the source
to the real snapshot UUID instead of
TakeSnapshotOnHypervisorMsg.getSnapshotName().
Normalize SDN controller inventory endpoint host extraction, including bracketed IPv6 with ports and trimmed host values. Validation: mvn -pl header -DskipTests -Djacoco.skip=true compile passed in prior reviewCommentFix run. Resolves: ZCF-2747 Change-Id: I00939db3552081f4822762e138fb4652d3513f7c
aea6000 to
8495919
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/src/test/groovy/org/zstack/test/unittest/network/sdncontroller/SdnControllerInventoryCase.groovy (1)
9-15: ⚡ Quick win补充
null与仅空白 endpoint 的回归断言。当前用例已覆盖主要路径,但生产逻辑显式处理了
null和trim,建议补两条断言,避免后续回归。建议补充的测试片段
void testInventoryIpStripsEndpointPort() { assert inventoryIp(" 10.0.0.10:8080 ") == "10.0.0.10" assert inventoryIp(" controller.example.com:6640 ") == "controller.example.com" assert inventoryIp(" [fe80::1]:6640 ") == "fe80::1" assert inventoryIp(" 2001:db8::1 ") == "2001:db8::1" assert inventoryIp(" controller.example.com ") == "controller.example.com" + assert inventoryIp(null) == null + assert inventoryIp(" ") == "" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/groovy/org/zstack/test/unittest/network/sdncontroller/SdnControllerInventoryCase.groovy` around lines 9 - 15, Update the testInventoryIpStripsEndpointPort test to add regression assertions for null and all-whitespace endpoints: call inventoryIp(null) and inventoryIp(" ") and assert they return null (matching the production logic that handles null and trims input); place these two assertions alongside the existing asserts in the testInventoryIpStripsEndpointPort method so future changes won't regress handling of null/blank endpoints.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@test/src/test/groovy/org/zstack/test/unittest/network/sdncontroller/SdnControllerInventoryCase.groovy`:
- Around line 9-15: Update the testInventoryIpStripsEndpointPort test to add
regression assertions for null and all-whitespace endpoints: call
inventoryIp(null) and inventoryIp(" ") and assert they return null (matching
the production logic that handles null and trims input); place these two
assertions alongside the existing asserts in the
testInventoryIpStripsEndpointPort method so future changes won't regress
handling of null/blank endpoints.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 0370d145-1ab8-4d4f-88a6-8f208593de68
📒 Files selected for processing (2)
header/src/main/java/org/zstack/header/network/sdncontroller/SdnControllerInventory.javatest/src/test/groovy/org/zstack/test/unittest/network/sdncontroller/SdnControllerInventoryCase.groovy
ZCF-2747\n\n- Return SDN controller inventory ip as host address when DB stores host:port endpoint.\n- Keeps persisted controller endpoint unchanged for backend use.\n\nVerification:\n- git diff --check\n- mvn -Ppremium -pl :plugin-premium,:test-premium,:header -am -DfailIfNoTests=false -Dtest=TestZnsVmNicLifeCycleCase,TestZnsControllerLifeCycleCase test: reactor compile passed; TestZnsControllerLifeCycleCase passed; TestZnsVmNicLifeCycleCase blocked by local test environment startup/JMX after prior fork.
sync from gitlab !9925