Fix multus kubeconfig brackets before binary upgrade#3000
Fix multus kubeconfig brackets before binary upgrade#3000sdodson wants to merge 1 commit intoopenshift:release-4.14from
Conversation
The Go bump for CVE-2025-47912 causes net/url to reject non-IPv6 hostnames wrapped in brackets. During 4.13->4.14 upgrades, the old multus kubeconfig on nodes contains server URLs like https://[hostname]:6443 (written by the 4.13 entrypoint which unconditionally wraps KUBERNETES_SERVICE_HOST in brackets). When the 4.14 multus DaemonSet rolls out, cnibincopy.sh copies the new binary to the node before multus-daemon rewrites the kubeconfig. CRI-O immediately uses the new binary for pod sandbox teardowns, reads the old bracketed kubeconfig, and every DEL call fails with: Multus: error getting k8s client: host must be a URL or a host:port pair: "https://[hostname]:6443" This blocks all pod termination, stalls the dns-default DaemonSet rollout, and causes DNS operator degradation — failing the upgrade. Add an init container that reads the existing kubeconfig and strips brackets from non-IPv6 hostnames before the main container copies the new binary. IPv6 addresses in brackets are preserved. The init container is a no-op on fresh installs (no kubeconfig exists) and on already-fixed nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
/test ci/prow/4.14-upgrade-from-stable-4.13-images |
|
/retest-required |
| KC="/host/etc/cni/net.d/multus.d/multus.kubeconfig" | ||
| [ -f "$KC" ] || exit 0 | ||
| if grep -q 'server:.*://\[' "$KC"; then | ||
| sed -i -E 's|(server: https?://)(\[)([^]]+[^0-9:.])(\])|\1\3|g' "$KC" |
There was a problem hiding this comment.
sed -i -E 's|(server: https?://)\[([^]:]+)\]|\1\2|g' "$KC"
replace with this so that we dont break on dual stack clusters
Summary
net/urlnow rejectshttps://[hostname]:6443format URLsProblem
During 4.13→4.14 upgrades, the old multus kubeconfig on nodes (written by the 4.13 shell entrypoint which unconditionally wraps
KUBERNETES_SERVICE_HOSTin brackets) contains server URLs likehttps://[hostname]:6443. When the 4.14 multus DaemonSet rolls out,cnibincopy.shcopies the new binary beforemultus-daemonrewrites the kubeconfig. CRI-O immediately uses the new binary for pod sandbox teardowns (DEL), reads the old bracketed kubeconfig, and fails:This blocks all pod termination (500+ FailedKillPod events), stalls the dns-default DaemonSet rollout, causes DNS operator degradation, and fails the upgrade. The
gcp-ovn-rt-upgrade-4.14-minorblocking job has been failing for 5+ consecutive payloads due to this issue.Fix
Add a
fix-cni-kubeconfiginit container that runs beforekube-multus:sedTest plan
gcp-ovn-rt-upgrade-4.14-minor(4.13→4.14 upgrade) passes with this change🤖 Generated with Claude Code