Conversation
Restrict CRD generation to ./api/... so manifests no longer produces config/crd/bases/kilo.squat.ai_peers.yaml for the external Kilo Peer type. Regenerate deepcopy files so the boilerplate header from hack/boilerplate.go.txt is committed, matching what controller-gen produces in CI. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Test files contain CIDRs, namespace names and similar fixtures repeated across cases; promoting them to constants only obscures intent. Add goconst to the linter exclusion list for _test\.go. Signed-off-by: Arsolitt <arsolitt@gmail.com>
The operator binary does not define a --namespace flag; passing it caused the manager to exit at startup. Remove the argument from the Deployment and the corresponding unit test. The chart wired --metrics-bind-address=:8080 while the binary defaults metrics-secure to true, which started HTTPS on an HTTP port and made metrics unscrapeable without a TLS setup. Expose metricsSecure in values.yaml (default false) and pass --metrics-secure explicitly. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Helm 4.1 requires plugin source verification by default, which the helm-unittest source does not support. Without the flag the helm CI job aborts with "plugin source does not support verification" before helm lint and unit tests can run. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Add an image job that builds the multi-arch Containerfile and pushes to ghcr.io/cozystack/kilo-clustermesh-operator with :main and :sha-<commit> tags. The job runs only on push to main and waits for all checks (lint, test, integration, build, helm, generate) so a broken commit cannot publish an image. Tagged releases are still handled by .github/workflows/release.yml. Signed-off-by: Arsolitt <arsolitt@gmail.com>
golangci-lint v2.12 flagged "kilo.squat.ai" repeated across register.go and types_test.go via goconst. Define a GroupName constant in the Kilo v1alpha1 package and use it from both call sites; this also matches the convention used by upstream Kubernetes API packages. Signed-off-by: Arsolitt <arsolitt@gmail.com>
golangci-lint v2.12 picked up another goconst occurrence: the literal "v1alpha1" repeated across register.go and types_test.go. Define a GroupVersion constant alongside GroupName and use it from both call sites. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Add dev to the push/pull_request trigger lists and to the image job's allow-list. Switch the image tag source from a fixed raw "main" to type=ref,event=branch so pushes to dev publish :dev and pushes to main publish :main. VERSION build-arg follows the ref name. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Add --platform=\$BUILDPLATFORM to the builder stage so the Go toolchain runs natively on the GitHub runner (amd64) while cross-compiling for TARGETARCH through GOOS/GOARCH. QEMU emulation is now only used for the final distroless stage, which is just a file copy. ARM64 image builds drop from QEMU-emulated compilation to native compile + cross-link. Signed-off-by: Arsolitt <arsolitt@gmail.com>
The previous cmd/main.go constructed the manager with an unset Registry, Log and Recorder on ClusterMeshReconciler, which would have caused a nil pointer panic on the first reconcile. CRD self-install, despite being available in internal/crd, was never invoked, so the manager also failed to start with "no matches for kind ClusterMesh". Startup now: - runs crd.InstallOrUpdate before manager creation - reads its namespace from POD_NAMESPACE (downward API) - lists every ClusterMesh in that namespace, merges cluster entries by name and builds a multicluster.ClusterRegistry up-front - registers each cluster.Cluster with the manager so caches start alongside the leader - wires the reconciler with the real Registry, slog logger and the manager's event recorder - installs a change-watcher that cancels the manager context when cluster fingerprints drift, triggering a kubelet-driven restart The reconciler's Recorder field and the integration test suite are moved off the deprecated record.EventRecorder API onto the events package, fixing the staticcheck SA1019 warning. go mod tidy removes seven now-unused indirect dependencies. Signed-off-by: Arsolitt <arsolitt@gmail.com>
…ccomp RuntimeDefault Three blocking issues for running the operator on a cluster with PodSecurity restricted enforced: - ServiceAccount.automountServiceAccountToken: false stopped the controller from authenticating to the in-cluster API. Removed; the operator needs the projected token. - The container template did not request POD_NAMESPACE, which the new startup bootstrap reads to scope its ClusterMesh and Secret lookups. Injected via downward API. - securityContext.seccompProfile was unset, triggering a PodSecurity warning under restricted:latest. Set to RuntimeDefault. helm-unittest suites are updated accordingly. All 41 tests pass. Signed-off-by: Arsolitt <arsolitt@gmail.com>
The controller-runtime manager cache defaulted to cluster-scope list/watch on ClusterMesh and Secret, which required cluster-wide RBAC the chart deliberately does not grant. The operator only watches namespaced resources in its own namespace; cluster-scoped resources (Peers, Nodes, CRDs, Leases) are accessed via the multicluster registry or direct API calls and are unaffected. Restrict Cache.DefaultNamespaces to the operator's own namespace so the existing namespace-scoped Role is sufficient. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Upstream Kilo writes the kilo.squat.ai/wireguard-ip annotation as a host
route ("10.4.0.1/32"); the cozystack-patched Kilo with cross granularity
writes the host IP with the wireguard subnet mask ("100.66.0.3/16"). The
operator previously required /32 (or /128) and reused the raw annotation
value in the Peer AllowedIPs — both of which fail under cozystack-Kilo:
validation skips every node, and even if it did not, every peer would
claim the entire wireguard subnet.
Add a netutil helper that parses an annotation preserving host bits, and
another that emits the host route for a given IP. Validation now checks
only that the host IP falls inside the cluster's wireguardCIDR. The peer
builder normalises AllowedIPs to a /32 (resp. /128) host route so each
peer terminates traffic for exactly one WireGuard IP.
Covered by unit tests for both upstream-style and cozystack-style
annotations.
Signed-off-by: Arsolitt <arsolitt@gmail.com>
The migration to k8s.io/client-go/tools/events.EventRecorder caused events to be written to the events.k8s.io/v1 API group instead of the legacy core "events" resource. The Role only granted access to the latter, so every recorded event was rejected by the apiserver with events.events.k8s.io is forbidden. Grant create/patch on events.k8s.io/events alongside the existing rule for the core group; keep both so the operator stays compatible if any component falls back to the legacy API. Signed-off-by: Arsolitt <arsolitt@gmail.com>
…d roadmap The operator only sets a Peer endpoint from kilo.squat.ai/force-endpoint; clusters where nodes are not publicly reachable on that annotation will end up with peers that have no endpoint and silently drop traffic. The manager cache also does not watch Node objects, so annotation changes require a no-op write on the ClusterMesh resource to trigger a reconcile. Capture both of these in the README so users hit fewer surprises. Also refresh the surrounding documentation: - tighten Prerequisites to distinguish apiserver reachability from per-node UDP reachability; - replace the broken Remote Cluster Setup snippet with the working Secret-backed long-lived token procedure that survives 1.24+; - correct the Architecture section, which previously claimed the controller watches Node objects; - add a Possible Improvements list covering a dedicated cross-cluster endpoint annotation, a Node watcher, and an explicit per-node skip annotation. Stop tracking local cluster-specific deployment manifests under deploy/ by adding it to .gitignore; the upstream README now contains the generic procedure, so the per-cluster files belong in a private branch. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Signed-off-by: Arsolitt <arsolitt@gmail.com>
Signed-off-by: Arsolitt <arsolitt@gmail.com>
Signed-off-by: Arsolitt <arsolitt@gmail.com>
Signed-off-by: Arsolitt <arsolitt@gmail.com>
Signed-off-by: Arsolitt <arsolitt@gmail.com>
Signed-off-by: Arsolitt <arsolitt@gmail.com>
The previous default pointed at the upstream squat image, causing helm install without overrides to pull the wrong image. Switch the default to the cozystack fork, add a helm-unittest assertion that verifies the default, and remove the now-redundant repository override from the README tag-pinning example. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Two nodes with "10.4.0.1/16" and "10.4.0.1/32" resolve to the same WireGuard peer (AllowedIPs = 10.4.0.1/32) and therefore conflict. The old comparison was a raw-string equality check, so it missed this case. Extract the host IP via netutil.ParseHostInCIDR and use that as the dedup key. Invalid annotations fall back to raw-string keying so that identical-invalid values still deduplicate without colliding with any valid IP. Add three new test cases that cover: same host IP / different prefix lengths, different host IPs (sanity), and invalid vs valid annotation. Signed-off-by: Arsolitt <arsolitt@gmail.com>
TestMergeClusterSpecs in cmd/main_test.go was never executed in CI because the test step listed explicit package paths that omitted ./cmd/.... Add ./cmd/... to the go test invocation so all unit tests run on every push. Add internal/citest/workflow_test.go to structurally assert the presence of ./cmd/... in the ci.yml test step, preventing future accidental regressions. Signed-off-by: Arsolitt <arsolitt@gmail.com>
buildDNSOrIP strips brackets from the host before calling net.ParseIP, but the DNS fallback path was returning the unstripped host variable. A bracketed DNS name like [dns.example.com]:51820 would produce DNS: "[dns.example.com]", which is an invalid hostname. Change DNS: host to DNS: cleanHost so the brackets-stripped form is always used for the DNS field. Signed-off-by: Arsolitt <arsolitt@gmail.com>
…rmats The previous comment stated that wireguard-ip annotations must be a /32 (or /128) host route. This was accurate before the IsHostRoute validation check was dropped, but has been incorrect since that change. The operator now accepts any prefix length and validates only the host portion of the address against WireguardCIDR. Update the Go doc comment and the generated CRD YAML description to match actual behavior. Also annotate the existing regression test case to document that it guards against inadvertent reintroduction of the /32-only requirement. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Add table-driven unit tests for both public functions in internal/validation/mesh.go, which previously had 0% coverage in CI (integration tests were excluded from the unit test job). TestValidateClusterNetworks covers: - empty cluster list (nil error) - single cluster, multiple disjoint CIDRs (nil error) - single cluster with serviceCIDR overlapping additionalCIDR (error) - two clusters with disjoint CIDRs (nil error) - two clusters with overlapping serviceCIDR (error naming both clusters) - two clusters with overlapping podCIDR (error naming both clusters) - invalid CIDR string triggering a parse error TestValidateMeshNetworks covers: - empty mesh list (nil error) - single mesh with valid clusters (nil error) - two meshes with disjoint network plans (nil error) - two meshes with an overlapping CIDR (error naming both meshes) - intra-mesh overlap detected before cross-mesh check Helpers makeCluster and makeMesh follow the makeNode convention from node_test.go. Signed-off-by: Arsolitt <arsolitt@gmail.com>
The original example set identical serviceCIDR (10.96.0.0/12) on both clusters, and both wireguardCIDRs (10.100.x.0/24) fell inside that /12 range — all three overlaps would have caused ValidateClusterNetworks to return an error and suppress all Peer creation. Fix by assigning distinct, non-overlapping ranges to every field: - cluster-a wireguardCIDR: 10.200.0.0/24 - cluster-b wireguardCIDR: 10.200.1.0/24 - cluster-b serviceCIDR: 10.112.0.0/12 Add TestREADMEQuickStartManifestIsValid in internal/validation/mesh_test.go as a regression guard: it reads README.md, extracts the ClusterMesh YAML block, unmarshals it, and asserts ValidateClusterNetworks returns nil. The test fails with the original CIDRs and passes after this fix. Signed-off-by: Arsolitt <arsolitt@gmail.com>
The previous comment said "wireguard-ip annotation" without explaining that only the host IP is extracted and then normalised to a /32 (IPv4) or /128 (IPv6) host route. The updated comment makes the normalisation step explicit, which is important context given that the annotation may carry a wider subnet mask in cozystack-patched Kilo. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Introduces an optional WireguardPort field on ClusterEntry, defaulting to 51820. The operator uses this port when synthesising a peer endpoint from Node.Status.Addresses (i.e. neither clustermesh-endpoint nor force-endpoint annotation is set on the node). Signed-off-by: Arsolitt <arsolitt@gmail.com>
Adds AnnotationClustermeshEndpoint constant for an operator-specific node annotation that conveys the cross-cluster WireGuard endpoint independently of Kilo's own kilo.squat.ai/force-endpoint. Decoupling the two prevents the operator's endpoint configuration from affecting intra-cluster Kilo behaviour (notably the "cross" mesh granularity). Signed-off-by: Arsolitt <arsolitt@gmail.com>
12 tests covering ResolveEndpoint priority order, ExternalIP fallback (IPv4/IPv6 preference, default port), strict error on malformed annotations, and ignoring non-ExternalIP address types. Tests intentionally fail to compile until ResolveEndpoint is implemented. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Implements ResolveEndpoint which determines a node's WireGuard endpoint via a three-tier priority chain: the operator-specific kilo.squat.ai/clustermesh-endpoint annotation wins; otherwise the legacy kilo.squat.ai/force-endpoint annotation; otherwise the first ExternalIP in Node.Status.Addresses (IPv4 preferred over IPv6) combined with the fallback port. A present-but-malformed annotation is a hard error rather than a silent fall-through, so misconfiguration surfaces immediately instead of yielding peers without an endpoint. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Replaces the (meshName, sourceCluster string) and (meshName, sourceCluster, entry) signatures with a unified (meshName string, entry *ClusterEntry) shape. The entry carries both the cluster name and the new WireguardPort field, removing the need to plumb individual fields through call sites and preparing the builders to consume kilonode.ResolveEndpoint in the next step. Existing behaviour is preserved; tests pass without modification beyond the signature change. Signed-off-by: Arsolitt <arsolitt@gmail.com>
BuildPeer and BuildAnchorPeer now consume kilonode.ResolveEndpoint instead of reading kilo.squat.ai/force-endpoint directly. The peer endpoint is therefore picked up from the clustermesh-endpoint annotation, the legacy force-endpoint annotation, or Node.Status ExternalIPs in that priority order. Two behaviour changes flow from this: - A node with no endpoint source no longer produces a Peer without an endpoint silently — BuildPeer now returns an error and BuildAnchorPeer returns nil. Validation should already filter such nodes; the new behaviour is a defensive surface for missed cases. - A present-but-malformed endpoint annotation (clustermesh-endpoint or force-endpoint) is a hard error, not a silent fall-through. The affected unit test was rewritten accordingly. baseAnnotations() in builder_test.go now includes a valid force-endpoint so existing tests succeed by default; tests that exercise specific chain layers override or delete the relevant key. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Adds four ValidateNode test cases that the current implementation does not satisfy yet: a node missing every endpoint source must be skipped with ReasonNoEndpoint, a malformed clustermesh-endpoint or force-endpoint annotation must be skipped with ReasonEndpointInvalid, and a node whose only endpoint source is an ExternalIP must be accepted. The pre-existing baseAnnotations() now includes a valid force-endpoint so that all current "valid node" cases continue to pass once endpoint validation lands. Tests fail at compile time because ReasonNoEndpoint and ReasonEndpointInvalid do not exist yet; the Green commit will add them alongside the validateEndpoint helper. Signed-off-by: Arsolitt <arsolitt@gmail.com>
…point ValidateNode now exercises the endpoint fallback chain after the WireGuard IP and public key checks. A node with no source is skipped with ReasonNoEndpoint; a node whose clustermesh-endpoint or force-endpoint annotation is present but malformed is skipped with ReasonEndpointInvalid. Filtering out such nodes here prevents BuildPeer from later failing on a per-node basis and surfaces the misconfiguration as a clear SkippedNodes entry in ClusterMesh status. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Regenerates the ClusterMesh CRD via controller-gen to expose the new wireguardPort field with default 51820 and bounds [1, 65535], and mirrors the result into the embedded copy at internal/crd/ which the operator applies on startup. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Updates the per-node configuration and reconciliation-flow sections to describe the new three-tier endpoint chain (kilo.squat.ai/clustermesh-endpoint → kilo.squat.ai/force-endpoint → first ExternalIP), the strict treatment of malformed annotations (NodeEndpointInvalid skip reason), and the per-cluster wireguardPort field used by the ExternalIP fallback. Removes the now-implemented "dedicated cross-cluster endpoint annotation" item from the roadmap and updates the prerequisites note about the WireGuard UDP port to reflect that it is configurable. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Restructure README as a landing page (overview, requirements, quick
start) and split deep content into a flat English-only docs/ tree:
- docs/architecture.md — components, reconciliation flow, anchor
peer, manager cache scoping, CRD bootstrap,
restart watcher
- docs/installation.md — Helm install, embedded CRD bootstrap,
remote-cluster kubeconfig setup, RBAC,
verification, uninstall
- docs/configuration.md — ClusterMesh CRD reference: Spec, Status,
ClusterEntry fields, conditions, CIDR
validation rules
- docs/per-node-setup.md — required Node annotations, three-tier
endpoint resolution chain, strict-invalid
semantics, migration from force-endpoint
- docs/troubleshooting.md — full NodeSkipReason table, mesh-level
validation errors, status conditions,
common pitfalls
Signed-off-by: Arsolitt <arsolitt@gmail.com>
Add docs/known-gaps.md tracking outstanding work, divergences from the upstream proposal (cozystack/community#7), and settled design decisions that should not be re-litigated. Link from README documentation table and project-status note. Captures one blocker gap (no Node watches), one operational risk (silent anchor-peer suppression), and three proposal text corrections (annotation name, prefix rule, flat-vs-typed CRD shape). Signed-off-by: Arsolitt <arsolitt@gmail.com>
The endpoint resolution chain introduced in f8e0ab2 now rejects nodes that have neither a clustermesh-endpoint, a force-endpoint, nor an ExternalIP. The existing fixtures created nodes with no endpoint source at all, so ValidateNode skipped them with NodeNoEndpoint and the integration tests that asserted on resulting peer counts started failing in CI. Attach a default ExternalIP to every node built by makeNode so that the ExternalIP fallback succeeds, and persist Status.Addresses via the status subresource (Create does not). The TestHappyPath endpoint assertion is tightened to target the peer for the node that carries the explicit force-endpoint, since other peers now legitimately resolve via the ExternalIP fallback. Signed-off-by: Arsolitt <arsolitt@gmail.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR introduces configurable WireGuard port support, refactors endpoint resolution to support multiple annotation precedence levels with fallback to Kubernetes ExternalIP, and relaxes WireGuard CIDR validation to accept any prefix length. The operator startup is refactored into modular helpers, and comprehensive documentation is added covering installation, configuration, and troubleshooting. ChangesPort Configuration, Endpoint Resolution, and Node Validation
Operator Startup, Controller Wiring, and Validation
Helm Chart, CI, and Test Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Code Review
This pull request refactors the operator to support the Cozystack-patched Kilo fork, introducing prefix-agnostic WireGuard IP validation and a three-tier endpoint resolution chain. Key architectural improvements include embedded CRD bootstrapping, a fingerprint-based change-watcher that triggers controlled pod restarts on configuration changes, and restricted manager cache scoping for better isolation. The review identifies critical compilation errors caused by an incompatible switch from the record package to the events package for event recording, which must be reverted to maintain compatibility with the controller-runtime manager's interfaces and method signatures.
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/runtime" | ||
| "k8s.io/client-go/tools/record" | ||
| "k8s.io/client-go/tools/events" |
There was a problem hiding this comment.
The switch from k8s.io/client-go/tools/record to k8s.io/client-go/tools/events will cause a compilation error. The controller-runtime manager's GetEventRecorder method returns a record.EventRecorder, which is incompatible with the events.EventRecorder interface. Revert this import to use the record package.
| "k8s.io/client-go/tools/events" | |
| "k8s.io/client-go/tools/record" |
There was a problem hiding this comment.
This is intentional, not a regression. The controller-runtime manager exposes both GetEventRecorderFor(name) record.EventRecorder (the legacy core/v1 recorder, flagged //nolint:staticcheck upstream) and GetEventRecorder(name) events.EventRecorder (the events/v1 recorder). The reconciler is wired with the latter — see cmd/main.go:323: Recorder: mgr.GetEventRecorder(controllerEventName). The code builds cleanly under controller-runtime v0.23.3 and client-go v0.35.0.
| Registry *multicluster.ClusterRegistry | ||
| Log *slog.Logger | ||
| Recorder record.EventRecorder | ||
| Recorder events.EventRecorder |
There was a problem hiding this comment.
The Recorder events.EventRecorder field is the correct match for the value supplied at cmd/main.go:323 (mgr.GetEventRecorder(...), which returns events.EventRecorder). Reverting to record.EventRecorder would actually introduce the type mismatch. go build ./... passes on the current code.
| slog.String("reason", string(reason)), | ||
| ) | ||
| r.Recorder.Event(mesh, corev1.EventTypeWarning, string(reason), "node "+node.Name+" has duplicate WireGuard IP") | ||
| r.Recorder.Eventf(mesh, nil, corev1.EventTypeWarning, string(reason), "SkipNodePeering", "node %s has duplicate WireGuard IP", node.Name) |
There was a problem hiding this comment.
The Eventf method in record.EventRecorder has a different signature than events.EventRecorder. It does not take a related object or an action string. Update the call to match the record.EventRecorder interface signature: Eventf(object runtime.Object, eventtype, reason, messageFmt string, args ...interface{}).
| r.Recorder.Eventf(mesh, nil, corev1.EventTypeWarning, string(reason), "SkipNodePeering", "node %s has duplicate WireGuard IP", node.Name) | |
| r.Recorder.Eventf(mesh, corev1.EventTypeWarning, string(reason), "node %s has duplicate WireGuard IP", node.Name) |
There was a problem hiding this comment.
This matches the events.EventRecorder interface in k8s.io/client-go/tools/events: Eventf(regarding, related runtime.Object, eventtype, reason, action, note string, args ...interface{}). Here regarding=mesh, related=nil, action="SkipNodePeering", note="node %s has duplicate WireGuard IP". The suggested 5-arg form belongs to the legacy record.EventRecorder, which is not what the reconciler uses.
| slog.String("msg", msg), | ||
| ) | ||
| r.Recorder.Event(mesh, corev1.EventTypeWarning, string(reason), msg) | ||
| r.Recorder.Eventf(mesh, nil, corev1.EventTypeWarning, string(reason), "SkipNodePeering", "%s", msg) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Same as the duplicate-IP call above — the 7-arg signature matches events.EventRecorder.Eventf (regarding, related, eventtype, reason, action, note, args...). The events package is intentional across this reconciler.
| corev1 "k8s.io/api/core/v1" | ||
| "k8s.io/apimachinery/pkg/runtime" | ||
| "k8s.io/client-go/tools/record" | ||
| "k8s.io/client-go/tools/events" |
There was a problem hiding this comment.
The integration test wires Recorder: events.NewFakeRecorder(100) at line 110 to match the reconciler's events.EventRecorder field. The import is correct as-is.
| Registry: registry, | ||
| Log: slog.Default(), | ||
| Recorder: record.NewFakeRecorder(100), | ||
| Recorder: events.NewFakeRecorder(100), |
There was a problem hiding this comment.
events.NewFakeRecorder lives at client-go/tools/events/fake.go:36 and produces an events.EventRecorder, which is what the reconciler field requires. Switching to record.NewFakeRecorder would cause the actual type mismatch this thread is trying to prevent.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
internal/validation/node_test.go (1)
178-188: ⚡ Quick winMake the out-of-range regression case independent of endpoint-validation order.
This case should include a valid endpoint annotation so it always validates the intended
ReasonWGIPOutOfRangepath, regardless of future validation ordering changes.Minimal test fixture hardening
{ name: "wireguard IP host outside but network overlaps", node: makeNode("node-1", []string{"10.0.1.0/24"}, map[string]string{ // 10.5.0.1/16 → host 10.5.0.1 is outside 10.4.0.0/24 kilonode.AnnotationWireguardIP: "10.5.0.1/16", kilonode.AnnotationPublicKey: "dGVzdGtleQo=", + kilonode.AnnotationForceEndpoint: "203.0.113.1:51820", }), entry: baseEntry, wantSkipped: true, wantReason: validation.ReasonWGIPOutOfRange, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/validation/node_test.go` around lines 178 - 188, The test case "wireguard IP host outside but network overlaps" is brittle because it lacks a valid endpoint annotation and may hit endpoint-validation first; update the node fixture created by makeNode to include a valid endpoint annotation (e.g., kilonode.AnnotationEndpoint with a plausible value like "1.2.3.4:51820") so endpoint validation passes and the test deterministically exercises validation.ReasonWGIPOutOfRange for the AnnotationWireguardIP ("10.5.0.1/16"); keep baseEntry, wantSkipped and wantReason as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 82-96: Replace mutable action tags with exact commit SHAs and
disable checkout credential persistence: change the uses entries for
actions/checkout, docker/login-action, docker/setup-buildx-action,
docker/metadata-action (id: meta) and docker/build-push-action to fixed commit
SHAs and add persist-credentials: false to the actions/checkout step so the
GITHUB_TOKEN is not left in the workspace before docker/login-action runs; keep
the existing docker/login-action inputs (registry/username/password) unchanged
and verify metadata action (id: meta) tags input still matches your image
tagging needs after pinning.
In `@charts/kilo-clustermesh-operator/tests/deployment_test.yaml`:
- Around line 57-61: The test currently asserts the chart defaults to an
insecure flag by checking spec.template.spec.containers[0].args contains
"--metrics-secure=false" (test title "should pass --metrics-secure=false by
default"); change the test to enforce secure-by-default by removing or replacing
that assertion: either assert the args contain "--metrics-secure=true" or assert
that "--metrics-secure=false" is not present (and/or that no explicit insecure
flag exists), and update the test title accordingly to reflect a secure default.
In `@docs/installation.md`:
- Line 239: The markdown contains shell command examples prefixed with a prompt
("$ kubectl --context remote-cluster apply --filename remote-rbac.yaml") which
triggers markdownlint MD014; remove the leading "$ " from this and the other
affected command examples (the lines containing "kubectl --context
remote-cluster apply --filename remote-rbac.yaml" and the similar commands at
the other mentioned locations) so the examples are plain, prompt-free commands
in the docs/installation.md content.
- Around line 159-164: The fenced RBAC YAML block in the docs is missing a
language identifier causing markdownlint MD040; update the fence surrounding the
lines "apiGroups: [apiextensions.k8s.io] resources: [customresourcedefinitions]
verbs: [get, create, update]" to include the YAML language tag (e.g., change the
opening ``` to ```yaml) so the snippet is correctly recognized as YAML.
In `@docs/per-node-setup.md`:
- Around line 62-66: The fenced code blocks in docs/per-node-setup.md are
missing language identifiers (triggering MD040); update each code fence for the
listed examples— the three-line priority list
(kilo.squat.ai/clustermesh-endpoint, kilo.squat.ai/force-endpoint,
Node.Status.Addresses), the "<host>:<port>" example, the "[2001:db8::1]:51820"
example, and the "[node.example.com]:51820" example—by adding a language tag
such as "text" after the opening backticks so all fences become ```text ... ```
to satisfy markdownlint.
In `@README.md`:
- Around line 176-179: Remove the blank line inside the active blockquote in
README.md so the two quoted lines remain contiguous (both starting with '>'),
fixing markdownlint MD028; locate the blockquote text containing "Warning: Pod
CIDRs, WireGuard CIDRs, and service CIDRs must not overlap..." and "Note: The
CRD is automatically installed..." and delete the empty line between them so the
blockquote is continuous.
---
Nitpick comments:
In `@internal/validation/node_test.go`:
- Around line 178-188: The test case "wireguard IP host outside but network
overlaps" is brittle because it lacks a valid endpoint annotation and may hit
endpoint-validation first; update the node fixture created by makeNode to
include a valid endpoint annotation (e.g., kilonode.AnnotationEndpoint with a
plausible value like "1.2.3.4:51820") so endpoint validation passes and the test
deterministically exercises validation.ReasonWGIPOutOfRange for the
AnnotationWireguardIP ("10.5.0.1/16"); keep baseEntry, wantSkipped and
wantReason as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6d40ec23-4444-47e9-ba79-046a36151b34
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (51)
.github/workflows/ci.yml.gitignore.golangci.ymlContainerfileMakefileREADME.mdapi/v1alpha1/clustermesh_types.goapi/v1alpha1/clustermesh_types_test.goapi/v1alpha1/zz_generated.deepcopy.gocharts/kilo-clustermesh-operator/templates/deployment.yamlcharts/kilo-clustermesh-operator/templates/role.yamlcharts/kilo-clustermesh-operator/templates/serviceaccount.yamlcharts/kilo-clustermesh-operator/tests/deployment_test.yamlcharts/kilo-clustermesh-operator/tests/rbac_test.yamlcharts/kilo-clustermesh-operator/tests/serviceaccount_test.yamlcharts/kilo-clustermesh-operator/values.yamlcmd/main.gocmd/main_test.goconfig/crd/bases/kilo.squat.ai_clustermeshes.yamldocs/architecture.mddocs/configuration.mddocs/installation.mddocs/known-gaps.mddocs/per-node-setup.mddocs/troubleshooting.mdgo.modinternal/citest/workflow_test.gointernal/containerfile/containerfile_test.gointernal/controller/clustermesh_controller.gointernal/crd/clustermeshes.yamlinternal/kilonode/annotations.gointernal/kilonode/endpoint.gointernal/kilonode/endpoint_test.gointernal/netutil/cidr.gointernal/netutil/cidr_test.gointernal/peer/builder.gointernal/peer/builder_test.gointernal/peer/endpoint_test.gointernal/restart/watcher.gointernal/restart/watcher_test.gointernal/validation/mesh_test.gointernal/validation/node.gointernal/validation/node_test.gopkg/kilo/v1alpha1/register.gopkg/kilo/v1alpha1/types_test.gopkg/kilo/v1alpha1/zz_generated.deepcopy.gotest/integration/helpers_test.gotest/integration/labels_test.gotest/integration/reconcile_test.gotest/integration/suite_test.gotest/integration/validation_test.go
💤 Files with no reviewable changes (2)
- charts/kilo-clustermesh-operator/templates/serviceaccount.yaml
- go.mod
| - it: should pass --metrics-secure=false by default | ||
| asserts: | ||
| - contains: | ||
| path: spec.template.spec.containers[0].args | ||
| content: "--metrics-secure=false" |
There was a problem hiding this comment.
Avoid asserting insecure metrics as the default chart behavior.
Defaulting to --metrics-secure=false leaves metrics exposed unless users explicitly harden it. Prefer secure-by-default and require explicit opt-out.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@charts/kilo-clustermesh-operator/tests/deployment_test.yaml` around lines 57
- 61, The test currently asserts the chart defaults to an insecure flag by
checking spec.template.spec.containers[0].args contains "--metrics-secure=false"
(test title "should pass --metrics-secure=false by default"); change the test to
enforce secure-by-default by removing or replacing that assertion: either assert
the args contain "--metrics-secure=true" or assert that "--metrics-secure=false"
is not present (and/or that no explicit insecure flag exists), and update the
test title accordingly to reflect a secure default.
There was a problem hiding this comment.
Keeping metricsSecure=false as the chart default and the matching test assertion. This mirrors the kubebuilder operator-sdk template — TLS for the metrics endpoint requires the user to mount certificates or run cert-manager, which is out of scope for the chart's defaults. The dedicated test at lines 63-69 already covers the metricsSecure=true path for users who opt in.
There was a problem hiding this comment.
@Arsolitt, understood — that's a reasonable trade-off. Matching the kubebuilder operator-sdk template behavior keeps the chart predictable for users already familiar with that scaffold, and requiring cert-manager as a prerequisite for the default install path would be a significant barrier. The opt-in test at lines 63–69 provides sufficient coverage for users who do configure TLS. I'll withdraw the suggestion.
🐇 。゚(゚´ω`゚)゚。
✏️ Learnings added
Learnt from: Arsolitt
Repo: cozystack/kilo-clustermesh-operator PR: 2
File: charts/kilo-clustermesh-operator/tests/deployment_test.yaml:57-61
Timestamp: 2026-05-21T07:55:01.070Z
Learning: In the `charts/kilo-clustermesh-operator` Helm chart, `metricsSecure` intentionally defaults to `false` (matching the kubebuilder operator-sdk scaffold template). TLS for the metrics endpoint requires cert-manager or manual certificate mounting, which is considered out of scope for the chart's default install. Do not flag this as a security issue in future reviews.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Add persist-credentials: false to every actions/checkout step so the GITHUB_TOKEN is not retained in the workspace after the action exits. This is defense-in-depth: jobs that need authenticated access (image publish) authenticate explicitly via docker/login-action, and the remaining jobs are read-only consumers of the checked-out tree. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Address markdownlint MD040 on the ClusterRole snippet. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Remove the leading "$ " from single-command shell examples that show no command output. Addresses markdownlint MD014. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Add text language identifier to the four plain code fences (priority list, host:port template, IPv6 example, bracketed DNS example). Addresses markdownlint MD040. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Replace the blank separator between the CIDR-overlap warning and the CRD self-install note with a quoted blank line so both callouts live inside one continuous blockquote. Addresses markdownlint MD028. Signed-off-by: Arsolitt <arsolitt@gmail.com>
Summary
Initial POC implementation of the
kilo-clustermesh-operator— a Kubernetes operator that builds and maintains WireGuard cluster-to-cluster mesh peers via Kilo (cozystack-patched fork). Targets the design proposed in cozystack/community#7.What's Included
ClusterMeshCRD with typed CIDR fields (podCIDRs,wireguardCIDR,serviceCIDR,additionalCIDRs) and per-clusterwireguardPortkilo.squat.ai/clustermesh-endpoint→kilo.squat.ai/force-endpoint→Node.Status.AddressesExternalIPserviceCIDR,additionalCIDRs)/32and prefix-masked WireGuard IP annotation formats)docs/(architecture, installation, configuration, per-node setup, troubleshooting, known gaps)Scope and Limitations
This is a POC, not a complete implementation of the upstream proposal. Known divergences and outstanding work are tracked in
docs/known-gaps.md. Key gaps:ClusterMeshchanges; node annotation changes require a manual reconcile triggernodes[0]lacks a resolvable endpointTest Plan
make lintpassesmake testpasses (unit tests)helm template charts/kilo-clustermesh-operatorhelm unittest charts/kilo-clustermesh-operatorClusterMeshCR, confirm Peer objects appear in both clusters and cross-cluster pod connectivity worksSummary by CodeRabbit
Release Notes
New Features
Documentation
Security
Infrastructure