Skip to content

fix: separate ServiceAccount for router workloads#433

Open
ambient-code[bot] wants to merge 4 commits intomainfrom
fix/351-separate-service-accounts
Open

fix: separate ServiceAccount for router workloads#433
ambient-code[bot] wants to merge 4 commits intomainfrom
fix/351-separate-service-accounts

Conversation

@ambient-code
Copy link
Copy Markdown
Contributor

@ambient-code ambient-code bot commented Apr 8, 2026

Summary

  • Creates a dedicated router-sa ServiceAccount with zero RBAC bindings and automountServiceAccountToken: false for router workloads
  • The controller keeps its existing controller-manager SA with full RBAC (secrets CRUD, CRD access, leader election)
  • Updates both Helm chart templates and operator Go code to ensure consistency across deployment methods

Fixes #351

Changes

Helm chart

  • rbac/service_account.yaml: Split into controller-manager (controller) and router-sa (router)
  • rbac/role_binding.yaml, leader_election_role.yaml, leader_election_role_binding.yaml: Updated labels from jumpstarter-router to jumpstarter-controller
  • router-deployment.yaml, additional-router-deployment.yaml: Use router-sa with automountServiceAccountToken: false

Operator

  • rbac.go: Added createRouterServiceAccount() and reconciliation logic for the router SA
  • jumpstarter_controller.go: Updated createRouterDeployment() to use {name}-router-sa with automountServiceAccountToken: false

Test plan

  • Verify go build ./... passes for the operator (confirmed locally)
  • Deploy with Helm and verify router pods use router-sa with no token mounted
  • Verify controller pods still use controller-manager SA with full RBAC
  • Verify the secrets-job still runs correctly with controller-manager SA
  • Run e2e tests to confirm no regressions

🤖 Generated with Claude Code

The controller and router previously shared the same `controller-manager`
ServiceAccount, giving the router unnecessary cluster-wide secrets CRUD
access. This creates a dedicated `router-sa` ServiceAccount with no RBAC
bindings and `automountServiceAccountToken: false`, following the
principle of least privilege.

Fixes #351

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ae7064c-021c-40ea-842a-33dcbcf63ca4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/351-separate-service-accounts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 8, 2026

Deploy Preview for jumpstarter-docs ready!

Name Link
🔨 Latest commit 8de3816
🔍 Latest deploy log https://app.netlify.com/projects/jumpstarter-docs/deploys/69d68fb36efdb500083de8f7
😎 Deploy Preview https://deploy-preview-433--jumpstarter-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

kind: Role
metadata:
labels:
app.kubernetes.io/name: jumpstarter-router
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not modify any helm, it has been deprecated.

Helm charts are deprecated; only the operator deployment path
should be modified for the separate router ServiceAccount.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ambient-code
Copy link
Copy Markdown
Contributor Author

ambient-code bot commented Apr 8, 2026

Status Update

Review feedback addressed

  • @mangelajo's comment (Helm charts are deprecated): Reverted all 6 Helm chart template changes in commit 8bc4c6d. Only the operator deployment path (rbac.go, jumpstarter_controller.go, config/rbac/role.yaml) retains the separate router-sa ServiceAccount logic.

CI failures analysis

The following CI jobs failed:

  • deploy-kind (helm): Router pod failed to connect on port 8083 (timed out after 120s). This was likely caused by the Helm chart referencing the new router-sa ServiceAccount which may not have been created correctly. The revert of Helm changes should fix this.
  • deploy-kind (operator): Operator e2e test timed out at 600s waiting for a condition — needs re-run to confirm whether the operator-side changes work correctly after the Helm revert.
  • e2e-tests (ubuntu-24.04, ubuntu-24.04-arm): Router pod jumpstarter-router-0 timed out waiting for ready condition — same root cause as the Helm deploy failure.
  • e2e-compat-old-client: Likely cascading from the same router startup issue.

The Helm chart revert should resolve the helm-path failures. The operator-path changes (separate router-sa with automountServiceAccountToken: false) remain intact and should be validated on the next CI run.

The router process needs Kubernetes API access at startup to load its
configuration from a ConfigMap (via ctrl.GetConfigOrDie() and
LoadRouterConfiguration). Setting AutomountServiceAccountToken: false on
both the ServiceAccount and pod spec prevented the router from
authenticating, causing the pod to crash and never become ready (180s
timeout in CI).

Changes:
- Remove AutomountServiceAccountToken: false from router ServiceAccount
  and pod spec so the token is mounted
- Add a minimal router Role granting read-only access to configmaps and
  secrets (the only resources the router needs)
- Add a RoleBinding to bind the router Role to the router ServiceAccount

This maintains the security goal of separating the router SA from the
controller SA while granting only the minimum permissions needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ambient-code
Copy link
Copy Markdown
Contributor Author

ambient-code bot commented Apr 8, 2026

Root cause of CI failure

The router pod (jumpstarter-router-0) was timing out because it could not access the Kubernetes API at startup.

The router's main.go calls ctrl.GetConfigOrDie() and then LoadRouterConfiguration(), which reads the jumpstarter-controller ConfigMap from the K8s API. With AutomountServiceAccountToken: false set on both the ServiceAccount and pod spec, no service account token was mounted into the pod, so the in-cluster config had no credentials and the router crashed immediately on startup.

Fix

  1. Removed AutomountServiceAccountToken: false from both the router ServiceAccount definition and the router pod spec, so the SA token is mounted and the router can authenticate to the K8s API.

  2. Added a minimal RBAC Role ({name}-router-role) granting read-only access to configmaps and secrets — the only resources the router needs to read its configuration and secret keys.

  3. Added a RoleBinding ({name}-router-rolebinding) binding the router Role to the {name}-router-sa ServiceAccount.

This maintains the security goal of separating the router SA from the controller SA (the router SA has far fewer permissions than the controller SA) while providing the minimum permissions the router actually needs.

@ambient-code
Copy link
Copy Markdown
Contributor Author

ambient-code bot commented Apr 8, 2026

CI Fix: Router ServiceAccount needs K8s API access

Root cause: The router process calls ctrl.GetConfigOrDie() and LoadRouterConfiguration() at startup, which reads the jumpstarter-controller ConfigMap from the Kubernetes API. Setting AutomountServiceAccountToken: false on both the ServiceAccount and pod spec prevented the router from authenticating to the K8s API, causing it to crash immediately (180s ready timeout in CI).

Fix (commit 93ce272):

  • Removed AutomountServiceAccountToken: false from both the router ServiceAccount and pod spec
  • Added a minimal Role granting read-only access to configmaps and secrets (the resources the router actually needs)
  • Added a RoleBinding to bind the new Role to the router ServiceAccount
  • Added reconciliation logic for the new Role and RoleBinding

This maintains the security separation goal — the router SA has far fewer permissions than the controller SA — while granting the minimum permissions the router actually needs to function.

Comment on lines +378 to +382
APIGroups: []string{""},
Resources: []string{"secrets"},
Verbs: []string{"get", "list", "watch"},
},
},
Copy link
Copy Markdown
Member

@mangelajo mangelajo Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the router really need to get/list/watch secrets? why?

Resources: []string{"secrets"},
Verbs: []string{"get", "list", "watch"},
},
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right, it doesn't. Looking at LoadRouterConfiguration in controller/internal/config/config.go, the router only calls client.Get() on a ConfigMap (jumpstarter-controller), and LoadGrpcConfiguration just parses the in-memory config struct without any K8s API calls. The router never reads secrets.

I'll remove the secrets permission from the router Role so it only has get/list/watch on configmaps.

The router only reads a ConfigMap via LoadRouterConfiguration() and does
not access any secrets. Remove the secrets PolicyRule from the router
Role per review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shared ServiceAccount across 4 workloads with cluster-wide secrets CRUD

1 participant