ISSUE-186: Set CNI_MTU=1400 on kindnet daemonset to fix TCP DNS issues #193

kasturinarra · 2026-01-09T11:42:37Z

Summary by CodeRabbit

Chores
- Configure CI network MTU to 1400 for conformance runs to improve network stability.
- Restart cluster networking services and DNS during CI, wait for rollouts and a stabilization pause.
- Verify TCP-based DNS resolution from cluster nodes to reduce flaky network-related test failures.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-09T11:42:46Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Inserts a CI step in the CNCF conformance workflow that configures node networking to MTU=1400, updates interface MTUs, restarts kube-proxy and CoreDNS with rollouts and waits, pauses for stabilization, and verifies TCP-based DNS resolution from both nodes.

Changes

Cohort / File(s)
CI Networking Configure `/.github/workflows/cncf-conformance.yaml`	Adds a shell step (uses `set -euo pipefail`) inserted before the "Configure hostname resolution for cluster nodes" step that: updates `10-kindnet.conflist` CNI MTU to `1400` on both nodes, adjusts non-loopback interface MTUs >1400 to `1400`, restarts and waits for rollout of `kube-proxy` and `coredns`, pauses for stabilization, and validates TCP DNS resolution via `dig +tcp` from both nodes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

ggiguash

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding CNI MTU configuration to fix TCP DNS issues in the CI workflow.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/cncf-conformance.yaml:
- Around line 98-116: Increase the rollout timeout to 300s and replace the final
verification command so it asserts the CNI_MTU env var is set to 1400 (don’t
just print the env array and avoid the broken jsonpath escaping). Update the
kubectl commands that target daemonset/kube-kindnet-ds and CNI_MTU via the
existing make env CMD wrapper: change the rollout status timeout from 120s to
300s and change the verification step to run a command that exits non‑zero on
failure (for example use kubectl get ... -o json piped to grep -q
"CNI_MTU.*1400" or to jq -e to select the
.spec.template.spec.containers[0].env[] | select(.name=="CNI_MTU") | .value ==
"1400"), ensuring correct shell quoting so the make env CMD receives valid
syntax.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1045975 and c8010c9.

📒 Files selected for processing (1)

.github/workflows/cncf-conformance.yaml

🧰 Additional context used

📓 Path-based instructions (1)

.github/workflows/*.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

CI/CD workflows should validate builds with builders.yaml, test quickstart scripts with installers.yaml, support manual release workflow via release.yaml, and run daily OKD ARM builds via release-okd.yaml

Files:

.github/workflows/cncf-conformance.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: quick-rpm-and-clean (ubuntu-24.04)
GitHub Check: isolated-network (kindnet, ubuntu-24.04, 0)
GitHub Check: fedora-bootc (ubuntu-24.04-arm)
GitHub Check: isolated-network (ovnk, ubuntu-24.04, 1)
GitHub Check: fedora-bootc (ubuntu-24.04)
GitHub Check: ubuntu-rpm2deb
GitHub Check: centos10-bootc
GitHub Check: centos9-bootc
GitHub Check: quick-start-and-clean (ubuntu-24.04)
GitHub Check: quick-start-and-clean (ubuntu-24.04-arm)
GitHub Check: quick-rpm-and-clean (ubuntu-24.04-arm)

.github/workflows/cncf-conformance.yaml

ggiguash · 2026-01-10T08:48:54Z

.github/workflows/cncf-conformance.yaml

+              if [ -f "$CNI_CONFIG" ]; then
+                # Replace any existing mtu value with 1400, or add it if not present
+                if grep -q "\"mtu\"" "$CNI_CONFIG"; then
+                  sed -i "s/\"mtu\": *[0-9]*/\"mtu\": 1400/g" "$CNI_CONFIG"
+                else
+                  # Add mtu to the ptp plugin config
+                  sed -i "s/\"type\": *\"ptp\"/\"type\": \"ptp\", \"mtu\": 1400/g" "$CNI_CONFIG"
+                fi
+                echo "    Updated CNI config:"
+                grep -o "\"mtu\": *[0-9]*" "$CNI_CONFIG" || echo "    (mtu not found in config)"
+              else
+                echo "    Warning: CNI config not found at $CNI_CONFIG"
+              fi


This file has JSON format. We must use jq for querying and updating it.

ggiguash · 2026-01-10T08:49:06Z

.github/workflows/cncf-conformance.yaml

+              for iface in $(ip -o link show | grep -E "cni|kindnet|bridge" | awk -F": " "{print \$2}" | cut -d@ -f1); do
+                ip link set dev "$iface" mtu 1400 2>/dev/null && echo "    Set MTU=1400 on $iface" || true
+              done
+            ' || true


Why do we allow this to fail?

ggiguash · 2026-01-10T08:49:59Z

.github/workflows/cncf-conformance.yaml

+          # causing TCP packets (larger than UDP) to be fragmented/dropped.
+          # Kindnet auto-detects MTU but may get it wrong in nested container environments.
+
+          echo "Patching CNI config with MTU=1400 on all nodes..."


I think this code is complex enough that it "deserves" to be moved into a script

ggiguash · 2026-01-10T08:50:31Z

@coderabbitai review

coderabbitai · 2026-01-10T08:50:38Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/cncf-conformance.yaml:
- Around line 147-161: The Step 6 TCP DNS check currently only logs per-attempt
success/failure; change it to track failures and exit non-zero if any node fails
all attempts. Inside the podman exec block for nodes microshift-okd-1 and
microshift-okd-2, introduce a dns_fail flag (e.g., dns_fail=1 then set
dns_fail=0 on any successful result) that is checked after the 3 attempts: if
dns_fail is still set, echo a clear error and exit 1 so the workflow fails;
ensure the outer script propagates that non-zero exit (or use set -o
pipefail/set -e) so the CI run fails when TCP DNS cannot be resolved.

🧹 Nitpick comments (2)

.github/workflows/cncf-conformance.yaml (2)

107-121: Consider using jq for JSON manipulation.

The sed-based approach with double-quote patterns works but is fragile. Using jq would be more robust and idiomatic for modifying JSON configuration.

♻️ Proposed refactor using jq

           echo "=== Step 1: Configure MTU on CNI config ==="
           for node in microshift-okd-1 microshift-okd-2; do
             echo "  - Configuring MTU on ${node}"
             sudo podman exec "${node}" bash -c '
               CNI_CONFIG="/etc/cni/net.d/10-kindnet.conflist"
               if [ -f "$CNI_CONFIG" ]; then
-                if grep -q "\"mtu\"" "$CNI_CONFIG"; then
-                  sed -i "s/\"mtu\": *[0-9]*/\"mtu\": 1400/g" "$CNI_CONFIG"
-                else
-                  sed -i "s/\"type\": *\"ptp\"/\"type\": \"ptp\", \"mtu\": 1400/g" "$CNI_CONFIG"
-                fi
-                grep -o "\"mtu\": *[0-9]*" "$CNI_CONFIG" || echo "    (mtu not in config)"
+                jq ".plugins[0].mtu = 1400" "$CNI_CONFIG" > "${CNI_CONFIG}.tmp" && mv "${CNI_CONFIG}.tmp" "$CNI_CONFIG"
+                jq -r ".plugins[0].mtu // \"(mtu not set)\"" "$CNI_CONFIG"
               fi
             '
           done

153-153: Redundant empty-string check.

The condition checks non-empty twice: [ -n "$result" ] and [ "$result" != "" ] are equivalent.

♻️ Simplified condition

-                if [ -n "$result" ] && [ "$result" != "" ]; then
+                if [ -n "$result" ]; then

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c656aad and f7dbad0.

📒 Files selected for processing (1)

.github/workflows/cncf-conformance.yaml

🧰 Additional context used

📓 Path-based instructions (1)

.github/workflows/*.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

CI/CD workflows should validate builds with builders.yaml, test quickstart scripts with installers.yaml, support manual release workflow via release.yaml, and run daily OKD ARM builds via release-okd.yaml

Files:

.github/workflows/cncf-conformance.yaml

coderabbitai · 2026-01-10T08:55:02Z

.github/workflows/cncf-conformance.yaml

+          echo "=== Step 6: Verify TCP DNS works ==="
+          for node in microshift-okd-1 microshift-okd-2; do
+            echo "  Testing TCP DNS from ${node}..."
+            sudo podman exec "${node}" bash -c '
+              for i in 1 2 3; do
+                result=$(dig +tcp +short kubernetes.default.svc.cluster.local @10.43.0.10 2>&1)
+                if [ -n "$result" ] && [ "$result" != "" ]; then
+                  echo "    Attempt $i: OK ($result)"
+                else
+                  echo "    Attempt $i: FAILED"
+                fi
+                sleep 1
+              done
+            '
+          done


🛠️ Refactor suggestion | 🟠 Major

DNS verification doesn't fail on errors.

Step 6 validates TCP DNS but only logs results without failing the workflow if DNS doesn't work. Since this PR specifically addresses TCP DNS issues, the verification should fail the workflow if DNS resolution fails after all the applied fixes.

🔒 Proposed fix to fail on DNS errors

echo "=== Step 6: Verify TCP DNS works ===" for node in microshift-okd-1 microshift-okd-2; do echo " Testing TCP DNS from ${node}..." sudo podman exec "${node}" bash -c ' + failed=0 for i in 1 2 3; do result=$(dig +tcp +short kubernetes.default.svc.cluster.local @10.43.0.10 2>&1) if [ -n "$result" ]; then echo " Attempt $i: OK ($result)" + exit 0 else echo " Attempt $i: FAILED" + failed=1 fi sleep 1 done + if [ $failed -eq 1 ]; then + echo " ERROR: TCP DNS verification failed after 3 attempts" + exit 1 + fi ' done

🤖 Prompt for AI Agents

In @.github/workflows/cncf-conformance.yaml around lines 147 - 161, The Step 6 TCP DNS check currently only logs per-attempt success/failure; change it to track failures and exit non-zero if any node fails all attempts. Inside the podman exec block for nodes microshift-okd-1 and microshift-okd-2, introduce a dns_fail flag (e.g., dns_fail=1 then set dns_fail=0 on any successful result) that is checked after the 3 attempts: if dns_fail is still set, echo a clear error and exit 1 so the workflow fails; ensure the outer script propagates that non-zero exit (or use set -o pipefail/set -e) so the CI run fails when TCP DNS cannot be resolved.

kasturinarra requested a review from a team as a code owner January 9, 2026 11:42

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

.github/workflows/cncf-conformance.yaml Outdated Show resolved Hide resolved

kasturinarra force-pushed the fix_issue_186 branch from c8010c9 to c656aad Compare January 9, 2026 11:49

kasturinarra marked this pull request as draft January 9, 2026 14:14

kasturinarra force-pushed the fix_issue_186 branch 2 times, most recently from bd32a1c to f7dbad0 Compare January 9, 2026 18:54

ggiguash requested changes Jan 10, 2026

View reviewed changes

coderabbitai bot reviewed Jan 10, 2026

View reviewed changes

kasturinarra force-pushed the fix_issue_186 branch from f7dbad0 to b4d2084 Compare January 10, 2026 15:04

Set CNI_MTU=1400 on kindnet daemonset to fix TCP DNS issues

b82ee36

kasturinarra force-pushed the fix_issue_186 branch from b4d2084 to b82ee36 Compare January 10, 2026 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ISSUE-186: Set CNI_MTU=1400 on kindnet daemonset to fix TCP DNS issues #193

ISSUE-186: Set CNI_MTU=1400 on kindnet daemonset to fix TCP DNS issues #193

kasturinarra commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

ggiguash Jan 10, 2026

Uh oh!

ggiguash Jan 10, 2026

Uh oh!

ggiguash Jan 10, 2026

Uh oh!

ggiguash commented Jan 10, 2026

Uh oh!

coderabbitai bot commented Jan 10, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ISSUE-186: Set CNI_MTU=1400 on kindnet daemonset to fix TCP DNS issues #193

Are you sure you want to change the base?

ISSUE-186: Set CNI_MTU=1400 on kindnet daemonset to fix TCP DNS issues #193

Conversation

kasturinarra commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggiguash Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ggiguash Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ggiguash Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ggiguash commented Jan 10, 2026

Uh oh!

coderabbitai bot commented Jan 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kasturinarra commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading