Skip to content

Optional self-hosted Smallstep step-ca on the RADIUS VMs (ACME + SCEP)#1

Open
robbiet480 wants to merge 34 commits into
mainfrom
claude/smallstep-ca-infra
Open

Optional self-hosted Smallstep step-ca on the RADIUS VMs (ACME + SCEP)#1
robbiet480 wants to merge 34 commits into
mainfrom
claude/smallstep-ca-infra

Conversation

@robbiet480
Copy link
Copy Markdown
Member

@robbiet480 robbiet480 commented Jun 3, 2026

Summary

Adds an optional, off-by-default self-hosted Smallstep step-ca, co-located on the two RADIUS VMs, gated by enable_smallstep_ca. It issues EAP-TLS Wi-Fi client certs so Campus can decouple Wi-Fi from Okta's rate-limited SCEP CA. The repo stays MDM-agnostic (CA + RADIUS trust + outputs + example profiles); MDM wiring lives elsewhere.

  • KMS-backed CA (FIPS posture): local EC P-256 root (key discarded after one use) signs a Cloud KMS HSM intermediate; the HSM key is the only live private key. SCEP decrypter is a separate KMS RSA key.
  • ACME (/acme/wifi-acme/directory, Apple device-attest-01) + SCEP (/scep/wifi-scep), fronted by a GCLB + Cloud Armor with a managed TLS cert.
  • Active-active HA: both VMs run step-ca against shared Cloud SQL Postgres (ACME state) and restore the CA from Secret Manager — verified serving identical CA on both nodes.
  • radius_trust_mode gates the FreeRADIUS client-cert trust swap (okta default → smallstep) independently of the CA being up, so the CA can run during pre-stage without touching Wi-Fi.
  • Off-by-default is a true no-op (terraform plan adds 0 resources when disabled).

⚠️ ACME issuance must be gated by an authorizing webhook (Plan 2) before enrolling real devices — Apple attestation proves "a real Apple device", not "one of ours". acme_authorizing_webhook_url is currently empty by design.

Implemented via subagent-driven execution of docs/superpowers/plans/2026-06-02-smallstep-ca-infra.md (design: fleet-gitops docs/superpowers/specs/2026-06-02-smallstep-wifi-ca-decoupling-design.md). The plan doc's "Deviations" section records six bugs found + fixed during execution (DB tier/HA, least-privilege IAM, templatefile escaping, KMS init topology, provisioner-name collision, deb install URL).

Test Plan

  • terraform validate clean
  • terraform plan -var enable_smallstep_ca=false → 0 to add (true no-op)
  • Applied to campus-cloud-8021x-42e6 (radius_trust_mode=okta, webhook empty) — RADIUS/Wi-Fi untouched
  • step-ca healthy active-active on both VMs; KMS-backed signer confirmed; ACME directory + SCEP GetCACaps/GetCACert serving; primary & secondary return identical GetCACert fingerprint
  • Public path through LB once the Google-managed TLS cert goes ACTIVE (DNS ca.campusgroup.co8.233.40.66 created; cert was provisioning at PR time)
  • Wire acme_authorizing_webhook_url (Plan 2) before any real-device ACME enrollment

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Optional, off-by-default self-hosted CA for RADIUS providing ACME and SCEP endpoints, configurable trust mode (keep Okta or switch to Smallstep)
    • ACME issuance can be gated by a new authorizing webhook (Cloud Run) that verifies device attestations
    • Module now exposes ACME/SCEP URLs, CA certificate secret ID, and CA load‑balancer IP; startup can install and run the CA
  • Documentation

    • Deployment plans, README guidance, and example EAP‑TLS client profiles added
  • Tests

    • Unit tests for webhook, signature, server, authorizer, and Fleet lookup logic added

robbiet480 and others added 17 commits June 2, 2026 20:33
Terraform + startup-script plan to stand up optional KMS-backed step-ca
(ACME device-attest-01 + SCEP) on the RADIUS VMs, Cloud SQL for ACME HA,
GCLB + Cloud Armor front door, and an independently-gated RADIUS trust
swap. Verified by validate/plan + post-apply step-ca/curl health checks.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
db-f1-micro is shared-core and cannot use availability_type=REGIONAL;
it would fail at apply. Use the smallest standard tier that supports HA.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ewall

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…+accessor)

Replace roles/secretmanager.admin on the VM SA with the narrower
secretVersionManager + secretAccessor; the VM only needs to add the
initial CA cert version and read it back, not delete/re-IAM the secret.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…/SCEP

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…template

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…file swap

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…on-box)

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…s/2nd VM

step-ca 0.30.2 can't init against a pre-created KMS key, so a local root
key signs a KMS-backed intermediate. Persist BOTH certs in Secret Manager
(new smallstep-intermediate-cert), discard the local root key after use,
and restore both on reuse. The KMS HSM key is the only live private key.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
step-ca 0.30.2 rejects duplicate provisioner names even across types, so
the ACME and SCEP provisioners can't both be 'wifi' (step-ca refused to
start). Default ACME->wifi-acme, SCEP->wifi-scep. Also add --force to the
root/intermediate 'step certificate create' calls so a re-run over stale
cert files doesn't trip an interactive overwrite prompt (which --no-tty
turned into an empty cert + crash loop).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The pinned 0.28.1 step-ca deb 404s and the asset name omits the Debian
revision suffix, so the bare step-ca_<ver>_amd64.deb never matched —
step-ca was never installed on a fresh boot (only worked because it was
hand-installed during bring-up). Pin 0.30.2 with the real -1_amd64.deb
asset names (both URLs verified HTTP 200) and fail loudly if install
fails. Without this, a VM reboot/reimage loses the CA install.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Captures the LB IP, ACME/SCEP URLs, CAThumbprint + root SHA-1, GetCACert
DER format, and the six fixes made during execution — inputs for Plans 2 & 3.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd281dc6-0247-4239-a9e1-4948fc8e5d34

📥 Commits

Reviewing files that changed from the base of the PR and between b7ce747 and 031990a.

📒 Files selected for processing (5)
  • scripts/startup.sh
  • variables.tf
  • webhook.tf
  • webhook/README.md
  • webhook/internal/authorize/authorize.go
✅ Files skipped from review due to trivial changes (1)
  • webhook/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • scripts/startup.sh
  • webhook.tf
  • variables.tf

Walkthrough

This PR adds optional, off-by-default self-hosted Smallstep step-ca deployment for RADIUS VMs. It introduces Terraform variables, KMS and Cloud SQL infrastructure, Secret Manager state, HTTPS load balancer with Cloud Armor, VM startup integration, and module outputs enabling device-attestation-gated certificate issuance via ACME and SCEP.

Changes

Smallstep CA Deployment

Layer / File(s) Summary
Configuration inputs and variables
variables.tf, terraform.tfvars.example, compute.tf, outputs.tf, README.md, examples/README.md
Introduces enable_smallstep_ca feature toggle, radius_trust_mode to select FreeRADIUS trust end-state (okta or smallstep), DNS name and provisioner naming for ACME/SCEP endpoints, ACME webhook URL, Cloud SQL DB tier, example tfvars, passes step-ca inputs into the VM startup template, and adds conditional outputs for ACME/SCEP/CA secret/LB IP.
KMS, Cloud SQL, and Secret Manager
smallstep.tf
Provisions Cloud KMS keyring and HSM-protected keys for CA signing and SCEP decryption, Cloud SQL Postgres with Private Service Access for ACME state, Secret Manager secrets for DB password, SCEP static challenge, CA root and intermediate certificates, and VM IAM bindings.
Network access and HTTPS load balancer
smallstep.tf
Creates firewall rule for step-ca listener, external managed HTTPS load balancer (global IP, managed TLS certificate), per-zone instance groups mapping VMs to port 8443, HTTPS health check on /health, Cloud Armor rate-limiting policy, backend service, URL map, HTTPS proxy, and global forwarding rule on port 443.
VM startup template and step-ca bootstrap
compute.tf, scripts/startup.sh
Extends Terraform template inputs to pass step-ca configuration to the startup script; startup script conditionally installs step-ca, retrieves secrets, restores or initializes a KMS-backed CA chain persisted to Secret Manager, renders ca.json for ACME (device-attest-01 with optional webhook) and SCEP, creates a systemd unit, and stages the Smallstep CA cert into FreeRADIUS when radius_trust_mode=smallstep.
Terraform module outputs
outputs.tf
Adds conditional outputs for the Smallstep ACME directory URL, SCEP URL, Smallstep CA cert Secret Manager ID, and the step-ca load balancer public IP when enable_smallstep_ca is true.
Documentation and examples
README.md, examples/README.md, docs/superpowers/plans/*
Adds README section for optional self-hosted Smallstep, an examples README describing EAP‑TLS client profile templates and token mappings, and a detailed implementation and verification plan including tasks, deviations, and post-apply checks.
ACME authorizing webhook (service code & tests)
webhook/main.go, webhook/internal/*, webhook/go.mod
Implements the webhook service: Cobra CLI, config loader, HMAC signature verifier, Fleet client, authorization logic (fail-closed), HTTP handler for POST /authorize and GET /healthz, request/response types, and unit tests for the core components.
ACME authorizing webhook (infra & deploy)
webhook.tf, webhook/Dockerfile, webhook/README.md
Gates an ACME authorizing webhook on enable_acme_webhook: creates Cloud Run service account, generates signing secret in Secret Manager, references Fleet API token, grants IAM, deploys Cloud Run v2 service with secret env wiring, allows public invoker, and outputs the /authorize URL; adds Dockerfile and README.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through clouds so high,
With step-ca keys and SCEP reply,
HTTPS blooms on port four-four-three,
Device attestation keeps it free,
Okta or Smallstep—pick your key. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding an optional self-hosted Smallstep step-ca to RADIUS VMs with ACME and SCEP support, which aligns with the PR objectives and the substantial changeset across multiple infrastructure files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/smallstep-ca-infra

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute.tf`:
- Around line 128-139: The VM boot race happens because the
google_compute_instance resources don't explicitly depend on the new Smallstep
prerequisites (Smallstep secrets, KMS IAM bindings, Cloud SQL), so add explicit
depends_on entries to the google_compute_instance resources referencing the
Smallstep secret manager secrets, the KMS IAM bindings for the Smallstep keys,
and google_sql_database_instance.smallstep (the resource referenced as
google_sql_database_instance.smallstep[0] in the diff); update the compute
instances' blocks (google_compute_instance) to include depends_on = [<smallstep
secret resources>, <google_kms_crypto_key_iam_binding or
google_kms_crypto_key_iam_member resources for the smallstep signing/scep keys>,
google_sql_database_instance.smallstep] so the VMs wait for Smallstep secrets,
KMS IAM and Cloud SQL to be ready before startup.

In `@docs/superpowers/plans/2026-06-02-smallstep-ca-infra.md`:
- Around line 1056-1069: Update Step 4/5 verification commands to use the
deployed paths `wifi-acme` and `wifi-scep` (replace `/acme/wifi/...` and
`/scep/wifi` with `/acme/wifi-acme/...` and `/scep/wifi-scep`) and change the
GetCACert handling from PKCS#7 to raw DER X.509: curl the new endpoints, save
GetCACert to /tmp as DER, verify it with `openssl x509 -inform DER -noout -text`
(instead of pkcs7), and record the issuer cert SHA-1 using `openssl x509 -in
/tmp/scep-cacert.der -inform DER -noout -fingerprint -sha1` so Plan 3's
`CAThumbprint` gets the correct value.

In `@scripts/startup.sh`:
- Around line 230-266: The two failing calls to step certificate create (the
root and intermediate creation commands) are currently masked by "|| true" which
lets bootstrap continue on error; remove the "|| true" from both step
certificate create invocations, check their exit status (or rely on set -e) so
the script exits immediately on failure, and ensure the publishing block that
runs gcloud secrets versions add and the removal rm -f
"$STEPPATH/secrets/root_ca_key" only executes after verifying both
"$STEPPATH/certs/root_ca.crt" and "$STEPPATH/certs/intermediate_ca.crt" exist
(the existing if [ -f ... ] block is fine but make it the single gate for
publishing/cleanup); in short: eliminate "|| true" from the step certificate
create commands, enforce fail-fast on nonzero exit, and only publish secrets and
delete the local root key when the certificate existence check (files produced
by the step certificate create commands) succeeds.

In `@variables.tf`:
- Around line 197-200: The variable smallstep_ca_dns_name has a default empty
string which allows enable_smallstep_ca=true to proceed with an invalid
hostname; add a validation block on variable "smallstep_ca_dns_name" to require
a non-empty, valid-looking DNS name when enable_smallstep_ca is true (use the
variable name smallstep_ca_dns_name and the flag enable_smallstep_ca to gate the
check), returning a clear error message like "smallstep_ca_dns_name must be set
when enable_smallstep_ca is true" so the plan fails fast instead of producing
runtime/bootstrap errors.
- Around line 203-213: Add a Terraform variable validation to ensure the ACME
and SCEP provisioner names differ: update either the
smallstep_scep_provisioner_name (or smallstep_acme_provisioner_name) variable
block to include a validation { condition = var.smallstep_scep_provisioner_name
!= var.smallstep_acme_provisioner_name error_message =
"smallstep_acme_provisioner_name and smallstep_scep_provisioner_name must be
different and globally unique" } so Terraform fails fast if the two names are
equal; reference the variable names smallstep_acme_provisioner_name and
smallstep_scep_provisioner_name to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a4277c00-0124-4dd8-9214-abf7d56202c7

📥 Commits

Reviewing files that changed from the base of the PR and between 7c3b233 and 62a7472.

📒 Files selected for processing (9)
  • README.md
  • compute.tf
  • docs/superpowers/plans/2026-06-02-smallstep-ca-infra.md
  • examples/README.md
  • outputs.tf
  • scripts/startup.sh
  • smallstep.tf
  • terraform.tfvars.example
  • variables.tf

Comment thread compute.tf
Comment thread docs/superpowers/plans/2026-06-02-smallstep-ca-infra.md Outdated
Comment thread scripts/startup.sh
Comment thread variables.tf
Comment thread variables.tf
robbiet480 and others added 11 commits June 2, 2026 22:56
Go service that gates step-ca ACME/SCEP issuance to Fleet-enrolled device
serials: verifies step-ca's HMAC signature, extracts the attested serial,
looks it up in Fleet, returns allow only for an enrolled host. TDD'd,
fail-closed by construction. Cloud Run via Terraform; wires the webhook URL
into ca.json. Grounded in verified step-ca webhook contract + live Fleet
hosts/identifier endpoint.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
When acme_webhook_url is set, fetch acme-webhook-signing-secret, base64 it,
and emit it as the ca.json webhook "secret" so step-ca signs the
X-Smallstep-Signature the webhook verifies. Gated on acme_webhook_url != "".

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
robbiet480 and others added 4 commits June 2, 2026 23:18
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Review caught that Enrolled was hardcoded true for any serial Fleet had a
record for, including Off/Pending/unenrolled hosts. Parse host.mdm
.enrollment_status and treat only an 'On' status as enrolled, so the
authorizer denies known-but-unenrolled serials.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…band value)

The Fleet API token is a standing credential created out-of-band (not via
tfvars/CI). Terraform references the secret (data source) and grants the
webhook SA access; the value is added directly to Secret Manager.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
variables.tf (1)

232-258: ⚡ Quick win

Add validations to fail fast when webhook inputs are incomplete.

The webhook variables follow the same pattern flagged in earlier reviews for smallstep_ca_dns_name: required dependencies are documented in descriptions but not enforced, so Terraform accepts invalid combinations and defers failure to apply/runtime.

Consider adding validation blocks:

  1. enable_acme_webhook → require enable_smallstep_ca=true
  2. fleet_api_base_url → require non-empty when enable_acme_webhook=true
  3. webhook_image → require non-empty when enable_acme_webhook=true
Suggested validations
 variable "enable_acme_webhook" {
   description = "Deploy the ACME authorizing webhook (Cloud Run) and wire it into step-ca. Requires enable_smallstep_ca=true. MANDATORY before enrolling real devices over ACME."
   type        = bool
   default     = false
+
+  validation {
+    condition     = !var.enable_acme_webhook || var.enable_smallstep_ca
+    error_message = "enable_acme_webhook requires enable_smallstep_ca=true."
+  }
 }

 variable "fleet_api_base_url" {
   description = "Base URL of the Fleet server the webhook queries (e.g. https://fleet.campusgroup.co)."
   type        = string
   default     = ""
+
+  validation {
+    condition     = !var.enable_acme_webhook || trimspace(var.fleet_api_base_url) != ""
+    error_message = "fleet_api_base_url must be set when enable_acme_webhook is true."
+  }
 }

And similarly for webhook_image.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@variables.tf` around lines 232 - 258, Add Terraform validation blocks to fail
fast when ACME webhook inputs are incomplete: in variable "enable_acme_webhook"
add a validation requiring var.enable_smallstep_ca == true when
enable_acme_webhook is true; in variable "fleet_api_base_url" add a validation
requiring the string is not empty when var.enable_acme_webhook == true; and in
variable "webhook_image" add a validation requiring the string is not empty when
var.enable_acme_webhook == true. Reference the variables enable_acme_webhook,
enable_smallstep_ca, fleet_api_base_url, and webhook_image when writing these
validated_conditional rules and include clear error_message texts describing the
missing dependency.
webhook/internal/fleet/fleet.go (1)

66-67: ⚡ Quick win

Update the method contract comment to match the actual return shape.

LookupHostBySerial now returns a non-nil Host for present-but-unenrolled records as well, so “(host, nil) if enrolled” is misleading. That mismatch makes the exported API easy to misuse by future callers that rely on nil to mean “not enrolled.”

Suggested comment tweak
-// LookupHostBySerial returns (host, nil) if enrolled, (nil, nil) if no such
-// host (404), or (nil, error) on any transport/non-2xx/parse failure.
+// LookupHostBySerial returns (host, nil) when Fleet returns a host record
+// (with Host.Enrolled derived from mdm.enrollment_status), (nil, nil) if no
+// such host exists (404), or (nil, error) on any transport/non-2xx/parse
+// failure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webhook/internal/fleet/fleet.go` around lines 66 - 67, Update the method
comment for LookupHostBySerial to accurately describe its return values: state
that it returns (host, nil) when a host record exists (regardless of enrolled
status, i.e., present-but-unenrolled will also return a non-nil Host), (nil,
nil) when no host exists (404), and (nil, error) on transport/non-2xx/parse
failures; locate the comment immediately above the LookupHostBySerial function
and replace the misleading “if enrolled” phrasing with wording that clarifies
“if present (enrolled or unenrolled)”.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@webhook.tf`:
- Around line 146-148: The output block acme_webhook_url currently claims
callers should append /authorize but its value already appends "/authorize";
update the output description to accurately state that the returned URL already
includes the /authorize path (or remove the "append /authorize" instruction) so
users won't append it twice—modify the description text in the output
"acme_webhook_url" (which uses var.enable_acme_webhook and
google_cloud_run_v2_service.webhook[0].uri) to reflect the actual value.

In `@webhook/internal/authorize/authorize.go`:
- Around line 35-46: The authorization path currently logs raw device serials
(variable "serial") in the deny/allow branches; change logging in the function
in authorize.go to avoid emitting full serials—either omit the serial entirely
and rely on host.ID and the deny reason, or compute and log a non-reversible
short fingerprint (e.g., SHA-256 then hex-truncate) and use that instead of
"serial". Update the calls that use logrus.WithField/WithFields referencing
"serial" (the deny branches and the final allow log) to use the truncated
fingerprint variable or remove the field, and keep other fields like "host_id"
and "required_label" unchanged; ensure a.allowLabel and host.ID are still logged
where relevant.

In `@webhook/internal/server/server.go`:
- Around line 76-90: The handler only uses
req.AttestationData.PermanentIdentifier, so for SCEP requests the serial stays
empty and h.decider.Allow(serial) will always deny; update the request-parsing
path to extract the SCEP serial before calling h.decider.Allow: if
req.AttestationData is nil, pull the serial from the SCEP-specific field(s) on
types.RequestBody (e.g., the SCEP request wrapper or CSR subject serial number
present in the request structure) and assign it to serial (or fail/deny only if
no serial can be determined). Ensure the extraction happens prior to the Allow
call and adjust types.RequestBody if necessary so the handler can read the SCEP
serial.

In `@webhook/README.md`:
- Around line 45-49: The fenced code block containing the shell commands (lines
showing "go test ./...", "go build .", and "WEBHOOK_SIGNING_SECRET=... ...
./acme-authz-webhook serve") should include a language tag to satisfy
markdownlint MD040; update the opening triple-backtick to include "bash" (i.e.,
```bash) so the block is explicitly a bash/shell snippet.

---

Nitpick comments:
In `@variables.tf`:
- Around line 232-258: Add Terraform validation blocks to fail fast when ACME
webhook inputs are incomplete: in variable "enable_acme_webhook" add a
validation requiring var.enable_smallstep_ca == true when enable_acme_webhook is
true; in variable "fleet_api_base_url" add a validation requiring the string is
not empty when var.enable_acme_webhook == true; and in variable "webhook_image"
add a validation requiring the string is not empty when var.enable_acme_webhook
== true. Reference the variables enable_acme_webhook, enable_smallstep_ca,
fleet_api_base_url, and webhook_image when writing these validated_conditional
rules and include clear error_message texts describing the missing dependency.

In `@webhook/internal/fleet/fleet.go`:
- Around line 66-67: Update the method comment for LookupHostBySerial to
accurately describe its return values: state that it returns (host, nil) when a
host record exists (regardless of enrolled status, i.e., present-but-unenrolled
will also return a non-nil Host), (nil, nil) when no host exists (404), and
(nil, error) on transport/non-2xx/parse failures; locate the comment immediately
above the LookupHostBySerial function and replace the misleading “if enrolled”
phrasing with wording that clarifies “if present (enrolled or unenrolled)”.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebde7ed7-a343-443c-936c-0b6a6e3ac38d

📥 Commits

Reviewing files that changed from the base of the PR and between 62a7472 and 633e7af.

⛔ Files ignored due to path filters (1)
  • webhook/go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • docs/superpowers/plans/2026-06-02-acme-authorizing-webhook.md
  • scripts/startup.sh
  • variables.tf
  • webhook.tf
  • webhook/Dockerfile
  • webhook/README.md
  • webhook/go.mod
  • webhook/internal/authorize/authorize.go
  • webhook/internal/authorize/authorize_test.go
  • webhook/internal/config/config.go
  • webhook/internal/config/config_test.go
  • webhook/internal/fleet/fleet.go
  • webhook/internal/fleet/fleet_test.go
  • webhook/internal/server/server.go
  • webhook/internal/server/server_test.go
  • webhook/internal/signature/signature.go
  • webhook/internal/signature/signature_test.go
  • webhook/internal/types/types.go
  • webhook/internal/types/types_test.go
  • webhook/main.go
✅ Files skipped from review due to trivial changes (4)
  • webhook/Dockerfile
  • webhook/go.mod
  • webhook/internal/config/config.go
  • webhook/internal/server/server_test.go

Comment thread webhook.tf
Comment thread webhook/internal/authorize/authorize.go Outdated
Comment thread webhook/internal/server/server.go
Comment thread webhook/README.md Outdated
- compute.tf: both RADIUS VMs now depend_on the Smallstep prerequisites
  (secrets, KMS IAM, Cloud SQL) so first-boot bootstrap can't race ahead of
  them. Unindexed count-resource refs => no-op when enable_smallstep_ca=false.
- startup.sh: drop '|| true' from the two 'step certificate create' calls;
  fail-fast with a single non-empty-cert gate before publishing secrets /
  discarding the root key, so a broken KMS binding aborts instead of starting
  step-ca with a partial chain.
- Plan 1 doc: Task 10 verification now uses the deployed wifi-acme/wifi-scep
  paths and parses GetCACert as raw DER X.509 (not PKCS#7).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/startup.sh`:
- Around line 267-279: The current bootstrap can run concurrently on both VMs
and publish differing certs; modify the startup.sh flow to perform a distributed
publish-and-verify: first check Secret Manager for the presence of the
intermediate secret (smallstep-intermediate-cert) and attempt to create/publish
the certs only if the secret does not already exist (use the existing gcloud
secrets ... commands but guarded by a pre-check), use an atomic create-or-fail
approach (fail when creation is denied because another instance created it),
then after publishing always re-read both secrets from Secret Manager and
overwrite the local files in $STEPPATH/certs before proceeding to start step-ca
so both nodes use the exact same certs; reference the variables/commands in the
diff such as STEPPATH, smallstep-intermediate-cert, gcloud secrets versions add,
and step-ca when applying these changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ab8e4374-de92-434a-8497-4c00291f91e3

📥 Commits

Reviewing files that changed from the base of the PR and between 633e7af and b7ce747.

📒 Files selected for processing (3)
  • compute.tf
  • docs/superpowers/plans/2026-06-02-smallstep-ca-infra.md
  • scripts/startup.sh
✅ Files skipped from review due to trivial changes (1)
  • docs/superpowers/plans/2026-06-02-smallstep-ca-infra.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • compute.tf

Comment thread scripts/startup.sh
Comment on lines +267 to +279
# Single gate: only publish secrets + discard the root key once BOTH certs
# genuinely exist and are non-empty. On failure, leave files in place and abort.
[ -s "$STEPPATH/certs/root_ca.crt" ] && [ -s "$STEPPATH/certs/intermediate_ca.crt" ] || {
echo "FATAL: Smallstep CA bootstrap did not produce both certificates" >&2
exit 1
}
# Publish BOTH certs so reboots / the 2nd VM restore a matching chain.
gcloud secrets versions add smallstep-ca-cert --project="${project_id}" \
--data-file="$STEPPATH/certs/root_ca.crt"
gcloud secrets versions add smallstep-intermediate-cert --project="${project_id}" \
--data-file="$STEPPATH/certs/intermediate_ca.crt"
# Stage the ROOT cert for the RADIUS-trust step later in this script.
cp "$STEPPATH/certs/root_ca.crt" /tmp/smallstep-ca.crt
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Serialize the first-time CA bootstrap across both VMs.

This block still allows both RADIUS VMs to enter the init path on a cold rollout before smallstep-intermediate-cert exists. In that case they can mint and publish different root/intermediate pairs, and one node will keep serving its locally generated chain until it is restarted, which breaks the active-active “identical CA on both nodes” requirement.

Please gate initialization with a real distributed lock, or publish once and then re-read the certs from Secret Manager before starting step-ca.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/startup.sh` around lines 267 - 279, The current bootstrap can run
concurrently on both VMs and publish differing certs; modify the startup.sh flow
to perform a distributed publish-and-verify: first check Secret Manager for the
presence of the intermediate secret (smallstep-intermediate-cert) and attempt to
create/publish the certs only if the secret does not already exist (use the
existing gcloud secrets ... commands but guarded by a pre-check), use an atomic
create-or-fail approach (fail when creation is denied because another instance
created it), then after publishing always re-read both secrets from Secret
Manager and overwrite the local files in $STEPPATH/certs before proceeding to
start step-ca so both nodes use the exact same certs; reference the
variables/commands in the diff such as STEPPATH, smallstep-intermediate-cert,
gcloud secrets versions add, and step-ca when applying these changes.

… docs)

- variables.tf: validate smallstep_ca_dns_name is set when enabled, and that
  ACME/SCEP provisioner names differ (fail fast at plan, not boot).
- authorize.go: stop logging raw device serials (long-lived identifiers); log
  a short suffix on deny + host_id on allow.
- startup.sh ca.json: webhook certType X509 (it's attached only to the ACME
  provisioner); document that SCEP is NOT webhook-gated, so the handler never
  sees a serial-less SCEP request.
- webhook.tf: fix acme_webhook_url description (already includes /authorize).
- webhook/README.md: add bash language tag to the build fence (MD040).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant