Add CRD-runtime drift detection test framework#5209
Add CRD-runtime drift detection test framework#5209ChrisJBurns wants to merge 13 commits intomainfrom
Conversation
Silent drift between CRD types and their runtime config counterparts has caused user-facing bugs (e.g. PR #3118, issue #3125) where new fields appeared to work in tests but were not wired through the CRD conversion path. The only existing safety nets — code review and end-to-end tests — do not scale. Add a reusable reflection-based field walker and a bidirectional drift test pattern. The pattern requires every leaf JSON field on either side of a CRD/runtime boundary to be either explicitly mapped to the other side or explicitly declared as intentionally unmapped, with a justification string. Adding a field to either type without classifying it fails the test with a clear action-required message. Apply the pattern to MCPTelemetryConfig <-> telemetry.Config as the first real-world exercise. The drift test passes against the current codebase and surfaced one previously-undocumented leaf (openTelemetry.caBundleRef.configMapRef.optional) which is now explicitly declared. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5209 +/- ##
==========================================
+ Coverage 67.87% 67.92% +0.05%
==========================================
Files 610 612 +2
Lines 62405 62682 +277
==========================================
+ Hits 42356 42576 +220
- Misses 16869 16917 +48
- Partials 3180 3189 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fix lint failures from CI: rename reflect.Ptr to reflect.Pointer,
suppress exhaustive on the kind switch with rationale, and tag the
,inline json fixtures as a known revive false positive.
Replace the explicit leafTypes allowlist with a json.Marshaler interface
check. Every entry in the previous map shared one property — a custom
MarshalJSON whose output bears no relation to the Go field layout — so
asking the type how it serializes is more general than maintaining a
hand-rolled list of K8s and project types. The walker is now
self-maintaining for any future custom-marshaled type.
Address reviewer findings on telemetry_drift_test.go:
- delete dead sortedKeys helper and its misleading comment
- upgrade per-entry empty-field checks to require to avoid empty-string
pollution of the duplicate-detection maps
- assert no path appears in both ignore maps (cross-pollination)
- assert every mapping/ignore entry is still a live leaf on its type
so renames and deletions surface instead of leaving stale entries
- rewrite caCertPath, sensitiveHeaders, and serviceName justifications
to name the actual wiring symbols a reviewer can grep for
Tighten the FlattenJSONLeafFields godoc to describe the one-level
self-reference expansion that maxStructRevisits actually permits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify cycle detection: stop on first revisit instead of allowing one
self-referential expansion. The "expand once" gold-plating produced a
"next.<field>" entry on linked-list-shaped types, but no real CRD or
runtime config has cyclic shape, and stop-on-revisit is the simpler
contract for drift detection. The visited tracker becomes a
map[reflect.Type]struct{} and maxStructRevisits goes away.
Compress the FlattenJSONLeafFields godoc from a 9-bullet semantics list
into one paragraph. Most of the bullets are now subsumed by the single
"json.Marshaler => leaf, otherwise recurse on Struct/Slice/Array/Map"
rule. The detailed encoding/json reference belongs in the standard
library, not here.
Drop redundant subtests in TestFlattenJSONLeafFields. After the
json.Marshaler refactor, the json.RawMessage / thvjson.Map+Any /
vmcpconfig.Duration cases all exercise the same short-circuit branch as
metav1.Duration, so one Marshaler example is enough. Drop slice-of-
pointer-to-struct (covered by combining the pointer-deref and slice-of-
struct cases). Fold the dedicated PointerInputDereferenced test into the
table. Remove TestFlattenJSONLeafFields_OutputIsSorted: the table-
driven test already pins exact sorted-string equality on every case.
Update the recursive-type expectation to ["name"] to lock in the new
stop-on-revisit semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
golangci-lint v2 doesn't honor the //exhaustive:ignore source directive in this configuration; the project-wide nolint convention works. The switch genuinely falls through to a default leaf branch for every Kind not listed (primitives, interfaces, channels, etc.) — the suppression is a confirmed false positive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jhrozek
left a comment
There was a problem hiding this comment.
Framework is well-conceived, walker semantics are sound, and the drift pattern is a good safety net for future CRD↔runtime work. Two non-blocking notes inline — approving.
jerm-dro
left a comment
There was a problem hiding this comment.
I like this direction. A few suggestions to make this more general purpose and require less boilerplate.
| var telemetryIgnoredOnCRDOnly = map[string]string{ | ||
| "openTelemetry.enabled": "CRD-only gate; controls whether the converter populates runtime fields at all", | ||
| "openTelemetry.sensitiveHeaders.name": "K8s-secret-backed header value; injected as TOOLHIVE_OTEL_HEADER_* env vars on the proxyrunner pod " + | ||
| "(controllerutil.GenerateOpenTelemetryEnvVarsFromRef) and merged into OTLP headers at runtime; not written into telemetry.Config.Headers by the converter", | ||
| "openTelemetry.sensitiveHeaders.secretKeyRef.name": "K8s-secret-backed header value; injected as TOOLHIVE_OTEL_HEADER_* env vars on the proxyrunner pod " + | ||
| "(controllerutil.GenerateOpenTelemetryEnvVarsFromRef) and merged into OTLP headers at runtime; not written into telemetry.Config.Headers by the converter", | ||
| "openTelemetry.sensitiveHeaders.secretKeyRef.key": "K8s-secret-backed header value; injected as TOOLHIVE_OTEL_HEADER_* env vars on the proxyrunner pod " + | ||
| "(controllerutil.GenerateOpenTelemetryEnvVarsFromRef) and merged into OTLP headers at runtime; not written into telemetry.Config.Headers by the converter", | ||
| "openTelemetry.caBundleRef.configMapRef.name": "K8s ConfigMap reference; resolved by operator into runtime CACertPath", | ||
| "openTelemetry.caBundleRef.configMapRef.key": "K8s ConfigMap reference; resolved by operator into runtime CACertPath", | ||
| "openTelemetry.caBundleRef.configMapRef.optional": "K8s ConfigMap reference flag promoted from corev1.ConfigMapKeySelector; not part of runtime config", | ||
| } | ||
|
|
||
| // telemetryIgnoredOnRuntimeOnly lists runtime leaf fields that intentionally | ||
| // have no CRD counterpart. Each entry MUST include a justification. | ||
| var telemetryIgnoredOnRuntimeOnly = map[string]string{ | ||
| "serviceName": "per-server, set from MCPTelemetryConfigReference.ServiceName and defaulted at runtime by " + | ||
| "telemetry.ResolveServiceName; intentionally absent from the shared MCPTelemetryConfig", | ||
| "serviceVersion": "resolved at runtime from binary version (issue #2296)", | ||
| "environmentVariables": "CLI-only, not applicable to CRD-managed telemetry", | ||
| "caCertPath": "filesystem path injected by runconfig.AppendTelemetryRunnerOption after the operator computes the volume mount " + |
There was a problem hiding this comment.
suggestion: can telemetryIgnoredOnRuntimeOnly and telemetryIgnoredOnCRDOnly be specified as annotations/tags on the actual struct itself? That keeps the type's documentation in one spot. Annotations are accessible via the reflect package.
There was a problem hiding this comment.
Went back and forth with CC and it suggests keeping the maps. An ignore entry is really a cross-type fact ("no peer on the other side"), not an intrinsic property of one field. Putting it on the type also pushes edits into stable v1beta1 API code or into pkg/telemetry, which is shared with the standalone thv vmcp serve CLI and shouldn't carry operator-test metadata. Justification strings with parens, commas, and concatenation flatten awkwardly into tag values too. Since the Mappings table has to stay relational regardless, the savings would only be ~30 lines of strings. Happy to revisit if I'm missing something.
Three changes from PR review:
1. Fix bogus symbol name in caCertPath justification
(telemetry_drift_test.go). The string referenced
runconfig.AppendTelemetryRunnerOption, which does not exist anywhere
in the tree. The actual wiring lives at
cmd/thv-operator/pkg/runconfig/telemetry.go:33 inside
AddMCPTelemetryConfigRefOptions, where config.CACertPath is assigned
from the operator-computed volume-mount path.
2. Drop the standalone "if inline { return \"\" }" branch in the path-
segment builder (reflect.go). encoding/json does not recognize the
",inline" tag option — only ",omitempty" and ",string" are parsed;
any other token is silently discarded. Flattening is driven entirely
by StructField.Anonymous combined with the absence of an explicit
JSON name. Today every ",inline" usage in the codebase happens to
also be on an anonymous field, so the bug was masked, but a future
named field tagged json:"foo,inline" would have been incorrectly
flattened by the walker while encoding/json would emit "foo":{...}.
Add a regression test (named_field_with_,inline_tag_stays_nested) to
pin the corrected behavior, and simplify parseJSONTag to match.
3. Extract a generic AssertNoDrift[Runtime, CRD] helper
(internal/testutil/drift.go). The previous three top-level test
functions (CRDFieldsCovered, RuntimeFieldsCovered,
MappingTableSanity) are now subtests run inside the helper, so a new
domain (OIDC, vMCP, etc.) drops to a single TestXxxDrift function
that supplies its three tables. The helper carries Domain through
into failure messages so output stays as actionable as before. The
FieldMapping type and liveLeafSet helper move to the testutil
package to live alongside the harness.
Net: ~130 LoC removed from telemetry_drift_test.go, ~220 LoC of
reusable harness added in drift.go.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI flagged two issues on the previous push: 1. cyclomatic complexity 19 of `assertTableSanity` (limit 15). Extract the five independent invariants — empty/duplicate mappings, mapped- and-ignored overlap, justification non-emptiness, cross-pollination, and stale-entry detection — into named helpers. The orchestrator becomes a six-line dispatch and each helper is small enough to read in one screen. The stale-entry helper is parameterized by side so the CRD and runtime passes share one body. 2. paralleltest on TestTelemetryConfigDrift. Add t.Parallel() to the outer test; subtests inside AssertNoDrift already call it. Also unrelated to this PR: TestGetOverlayMounts in pkg/ignore is a known intermittent flake (race), not introduced by these changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jerm-dro
left a comment
There was a problem hiding this comment.
Approving since this is a net-positive but I have one suggestion on how to make this more useful beyond testing.
| // SPDX-FileCopyrightText: Copyright 2025 Stacklok, Inc. | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| package testutil |
There was a problem hiding this comment.
suggestion: I don't think this should be entirely a testutil.
Ideally, the mapping defined in telemetry_drift_test.go is what is used in the production code conversion. Then, in a test, we assert on the mappings and types that there has been no drift. In other words, the only test part of this code should be:
func TestTelemetryConfigDrift(t *testing.T) {
t.Parallel()
testutil.AssertNoDrift[telemetry.Config, v1beta1.MCPTelemetryConfigSpec](t, testutil.DriftSpec{
Domain: "telemetry",
// The following values are defined in the production code,
// since they are used to drive the conversion.
Mappings: telemetryFieldMappings,
IgnoredOnCRDOnly: telemetryIgnoredOnCRDOnly,
IgnoredOnRuntimeOnly: telemetryIgnoredOnRuntimeOnly,
})
}
WDYT?
Summary
MCPTelemetryConfig↔telemetry.Configas the first real-world exercise. The drift test passes against the current codebase.Type of change
Test plan
task test)task test-e2e)task lint-fix)Tests added (all passing): walker unit tests in
cmd/thv-operator/internal/testutil/reflect_test.go; drift tests incmd/thv-operator/pkg/spectoconfig/telemetry_drift_test.go(TestTelemetryConfigDrift_CRDFieldsCovered,TestTelemetryConfigDrift_RuntimeFieldsCovered,TestTelemetryConfigDrift_MappingTableSanity).task lint-fixcould not be run locally — the installedgolangci-lintv2.12.1 is built with go1.25 but the project targets go1.26.go vetandgofmtare clean on the new files. CI will run the project's pinned linter.API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.Test-only change. No production code, CRD schemas, or APIs are modified.
Changes
cmd/thv-operator/internal/testutil/reflect.goFlattenJSONLeafFieldshelper that returns sorted dot-delimited JSON leaf paths for a struct type. Recurses into nested structs, deref'd pointers, and slice/map element types. Stops at primitives and a small allowlist (time.Duration,metav1.Duration,json.RawMessage,thvjson.Map/Any,vmcpconfig.Duration). Skipsmetav1.TypeMeta/ObjectMeta/ListMetaand unexported /json:\"-\"fields. Honors,inlineand Go's anonymous-field promotion. Cycle-protected.cmd/thv-operator/internal/testutil/reflect_test.go,inline),json:\"-\", missing tags, omitempty, unexported fields, leaf allowlist, sorted output, nil/non-struct input.cmd/thv-operator/pkg/spectoconfig/telemetry_drift_test.goMCPTelemetryConfigSpec↔telemetry.Config. Two coverage tests (one per direction) sharing a singletelemetryFieldMappingssource-of-truth table, plus per-sideIgnoredOnCRDOnly/IgnoredOnRuntimeOnlymaps with justification strings. A third test asserts mapping-table sanity (no duplicates, no overlap with ignore lists, non-empty justifications).Does this introduce a user-facing change?
No.
Special notes for reviewers
The pattern is bidirectional by design — it does not mandate parity between CRD and runtime types. Either side may have fields the other lacks; the test simply forces every divergence to be explicitly classified. New entries in
IgnoredOnCRDOnly/IgnoredOnRuntimeOnlyrequire a justification string, which makes intentional decoupling visible in code review.The walker uncovered one CRD leaf that prior manual analysis missed:
openTelemetry.caBundleRef.configMapRef.optional, a*boolpromoted fromcorev1.ConfigMapKeySelectorviaLocalObjectReference. It is now explicitly declared intelemetryIgnoredOnCRDOnlywith a justification — exactly the pattern working as intended.This is the first PR in a series. Follow-ups will (1) introduce a CRD-owned
v1beta1.VirtualMCPConfigmirror to decoupleVirtualMCPServerSpec.Configfrompkg/vmcp/config.Configwithout changing the user-facing CRD schema, and (2) extend the drift pattern to other converter boundaries (OIDC, external auth strategies, embedded auth server config).Implementation plan
Approved drift detection plan
PR 1 (this PR): build the reusable reflection walker + bidirectional drift test machinery, exercise it against an already-decoupled boundary (telemetry) to validate the pattern.
PR 2 (next): mirror
vmcpconfig.Configtop-level fields intov1beta1.VirtualMCPConfig, with nested fields still pointing at runtime types. CRD JSON schema unchanged. Replacevmcp.Spec.Config.DeepCopy()in the converter with explicit field-by-field copy at the top level.PR 3+: add bidirectional drift tests for
VirtualMCPConfigand incrementally mirror nested types (AggregationConfig,OperationalConfig, …), one per PR, with the drift test guiding each mirror.🤖 Generated with Claude Code