You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a registry server needs to discover MCP resources across multiple namespaces (multi-tenant deployments), it requires ClusterRole/ClusterRoleBinding rather than namespace-scoped Role/RoleBinding. The operator currently cannot dynamically create these cluster-scoped RBAC resources because its own ClusterRole only has permissions for roles and rolebindings, not clusterroles and clusterrolebindings.
This creates a gap between the two installation methods:
Registry server Helm chart (from toolhive-registry-server): Installs its own ClusterRoles and ClusterRoleBindings directly, so cross-namespace discovery works out of the box.
Operator Helm chart: The operator dynamically creates only namespace-scoped Roles via ensureRBACResources() in cmd/thv-operator/pkg/registryapi/rbac.go. The static ClusterRole template (registry-api-clusterrole.yaml) only grants read access to mcpservers — it doesn't cover mcpremoteproxies, virtualmcpservers, services, or Gateway API resources needed for full cross-namespace registry functionality.
What the operator creates dynamically (namespace-scoped)
ServiceAccount, Role, RoleBinding per MCPRegistry — limited to the MCPRegistry's own namespace
Created by cmd/thv-operator/pkg/registryapi/rbac.go using the RBAC utility library (cmd/thv-operator/pkg/kubernetes/rbac/rbac.go), which only supports namespace-scoped resources
What the Helm chart provides statically (cluster-scoped)
A single toolhive-registry-api-role ClusterRole with read access to mcpservers only
A single toolhive-registry-api-rolebinding ClusterRoleBinding bound to the toolhive-registry-api ServiceAccount in the release namespace
A toolhive-registry-api ServiceAccount
These are not sufficient for multi-namespace scenarios and don't cover all required resource types
Static Helm templates are orphaned
The static templates (registry-api-clusterrole.yaml, registry-api-clusterrolebinding.yaml, registry-api-serviceaccount.yaml) are effectively unused:
The static ClusterRoleBinding binds to SA name toolhive-registry-api (from Helm values)
The operator dynamically creates a different SA named {registry-name}-registry-api (via GetServiceAccountName() in cmd/thv-operator/pkg/registryapi/types.go)
The operator's deployment code assigns the dynamic SA to registry pods (deployment.go:195)
No Go code in the operator references toolhive-registry-api
Result: the static ClusterRole's permissions never reach any registry API pod because the ClusterRoleBinding targets a ServiceAccount that no pod uses.
What the operator's own ClusterRole permits
roles and rolebindings: full CRUD (can create namespace-scoped RBAC)
clusterroles and clusterrolebindings: not listed (cannot create cluster-scoped RBAC)
Problem
For multi-namespace/tenant registry deployments via the operator, the registry API needs to watch MCP resources, services, and Gateway API resources across namespaces. This requires ClusterRoles, but:
The RBAC utility library (cmd/thv-operator/pkg/kubernetes/rbac/rbac.go) has no functions for creating ClusterRole or ClusterRoleBinding resources
The operator's own ClusterRole doesn't include permissions to manage clusterroles/clusterrolebindings
Even if the code were added, the operator would need escalated permissions in its own ClusterRole — creating a privilege spreading risk (Kubernetes RBAC escalation prevention limits the operator to creating ClusterRoles with permissions it already holds, but it could bind those permissions to arbitrary identities across the cluster)
The existing static Helm ClusterRole/ClusterRoleBinding are orphaned and don't reach the registry pods (see above)
Proposed Solutions
Option A: Expand the static Helm ClusterRole (recommended)
Instead of granting the operator runtime ClusterRole management capabilities, expand the existing static Helm ClusterRole template to include all permissions the registry API needs for cross-namespace discovery:
Expand registry-api-clusterrole.yaml to include mcpremoteproxies, virtualmcpservers, services, and Gateway API resources (matching the rules in registryAPIRBACRules)
Fix the orphaned binding — either have the operator use the static SA name, or have the operator create a ClusterRoleBinding from the dynamic SA to the static ClusterRole
Make this opt-in via a Helm value (e.g., registryAPI.clusterScoped: false) that controls whether the expanded ClusterRole/ClusterRoleBinding are rendered
When cluster-scoped, the operator can skip creating the namespace-scoped Role for permissions already covered by the ClusterRole
Tradeoffs: The operator pod itself never needs clusterroles/clusterrolebindings permissions — the ClusterRole is provisioned at Helm install time (which already requires elevated access). This is strictly more secure than granting runtime ClusterRole creation. The tradeoff is that the static ClusterRole is fixed at install time and can't adapt per-MCPRegistry.
Add a Helm value (e.g., operator.rbac.enableClusterScopedWorkloadRBAC: false) that, when enabled:
Adds clusterroles and clusterrolebindings to the operator's own ClusterRole
Extends the RBAC utility library to support ClusterRole/ClusterRoleBinding creation
Allows the MCPRegistry controller to create cluster-scoped RBAC when the registry needs cross-namespace access (possibly driven by a field on the MCPRegistry spec)
Tradeoffs: More flexible than Option A (per-MCPRegistry ClusterRoles), but increases the operator's attack surface. An attacker who compromises the operator pod could create ClusterRoles (limited to the operator's own permission level by Kubernetes RBAC escalation prevention) and bind them to arbitrary identities across the cluster, enabling privilege spreading. Opt-in preserves least-privilege by default.
Option C: Grant ClusterRole permissions by default
Always include clusterroles/clusterrolebindings in the operator's ClusterRole and create ClusterRoles for registry API deployments.
Tradeoffs: Simpler implementation, but increases the operator's default attack surface. Conflicts with the least-privilege approach that motivated #4030.
Option D: Extend the disableWorkloadRBAC pattern from #4030
Lean into the external RBAC management model — document the required ClusterRole/ClusterRoleBinding for multi-namespace registries and let platform teams provision them via GitOps, separate Helm charts, or Kustomize overlays.
Tradeoffs: No operator code changes needed. Puts the burden on platform teams. Works well in GitOps-heavy environments but is a poor experience for simpler deployments.
Additional Considerations
Multiple installation methods: Users may deploy registries via the operator OR via the standalone registry Helm chart. RBAC solutions must account for both paths and avoid conflicts when both are present.
Kubernetes RBAC escalation prevention: Kubernetes prevents a controller from creating or modifying Role/ClusterRole resources containing permissions the controller doesn't itself hold (unless the escalate verb is granted). For ClusterRoleBindings, a separate bind verb restriction applies (enforced in Kubernetes 1.25+). The operator's ClusterRole must be a superset of any ClusterRole it creates for the registry API.
Testing complexity: This is finicky to test because the behavior differs across installation methods, RBAC scopes, and feature flag combinations. E2E tests should cover: single-namespace registry via operator, multi-namespace registry via operator, registry with disableWorkloadRBAC, and coexistence with standalone registry chart.
Orphaned static templates: The existing static SA/ClusterRole/ClusterRoleBinding should be fixed regardless of which solution is chosen — either by removing them, connecting them to the dynamic SA, or making the operator use the static SA.
Summary
When a registry server needs to discover MCP resources across multiple namespaces (multi-tenant deployments), it requires ClusterRole/ClusterRoleBinding rather than namespace-scoped Role/RoleBinding. The operator currently cannot dynamically create these cluster-scoped RBAC resources because its own ClusterRole only has permissions for
rolesandrolebindings, notclusterrolesandclusterrolebindings.This creates a gap between the two installation methods:
ensureRBACResources()incmd/thv-operator/pkg/registryapi/rbac.go. The static ClusterRole template (registry-api-clusterrole.yaml) only grants read access tomcpservers— it doesn't covermcpremoteproxies,virtualmcpservers, services, or Gateway API resources needed for full cross-namespace registry functionality.Related: stacklok/toolhive-registry-server#287 (multi-tenancy & multiple registry server installations)
Current State
What the operator creates dynamically (namespace-scoped)
ServiceAccount,Role,RoleBindingper MCPRegistry — limited to the MCPRegistry's own namespacecmd/thv-operator/pkg/registryapi/rbac.gousing the RBAC utility library (cmd/thv-operator/pkg/kubernetes/rbac/rbac.go), which only supports namespace-scoped resourcesWhat the Helm chart provides statically (cluster-scoped)
toolhive-registry-api-roleClusterRole with read access tomcpserversonlytoolhive-registry-api-rolebindingClusterRoleBinding bound to thetoolhive-registry-apiServiceAccount in the release namespacetoolhive-registry-apiServiceAccountStatic Helm templates are orphaned
The static templates (
registry-api-clusterrole.yaml,registry-api-clusterrolebinding.yaml,registry-api-serviceaccount.yaml) are effectively unused:toolhive-registry-api(from Helm values){registry-name}-registry-api(viaGetServiceAccountName()incmd/thv-operator/pkg/registryapi/types.go)deployment.go:195)toolhive-registry-apiResult: the static ClusterRole's permissions never reach any registry API pod because the ClusterRoleBinding targets a ServiceAccount that no pod uses.
What the operator's own ClusterRole permits
rolesandrolebindings: full CRUD (can create namespace-scoped RBAC)clusterrolesandclusterrolebindings: not listed (cannot create cluster-scoped RBAC)Problem
For multi-namespace/tenant registry deployments via the operator, the registry API needs to watch MCP resources, services, and Gateway API resources across namespaces. This requires ClusterRoles, but:
cmd/thv-operator/pkg/kubernetes/rbac/rbac.go) has no functions for creatingClusterRoleorClusterRoleBindingresourcesclusterroles/clusterrolebindingsProposed Solutions
Option A: Expand the static Helm ClusterRole (recommended)
Instead of granting the operator runtime ClusterRole management capabilities, expand the existing static Helm ClusterRole template to include all permissions the registry API needs for cross-namespace discovery:
registry-api-clusterrole.yamlto includemcpremoteproxies,virtualmcpservers, services, and Gateway API resources (matching the rules inregistryAPIRBACRules)registryAPI.clusterScoped: false) that controls whether the expanded ClusterRole/ClusterRoleBinding are renderedTradeoffs: The operator pod itself never needs
clusterroles/clusterrolebindingspermissions — the ClusterRole is provisioned at Helm install time (which already requires elevated access). This is strictly more secure than granting runtime ClusterRole creation. The tradeoff is that the static ClusterRole is fixed at install time and can't adapt per-MCPRegistry.Option B: Feature-toggled runtime ClusterRole creation
Add a Helm value (e.g.,
operator.rbac.enableClusterScopedWorkloadRBAC: false) that, when enabled:clusterrolesandclusterrolebindingsto the operator's own ClusterRoleClusterRole/ClusterRoleBindingcreationTradeoffs: More flexible than Option A (per-MCPRegistry ClusterRoles), but increases the operator's attack surface. An attacker who compromises the operator pod could create ClusterRoles (limited to the operator's own permission level by Kubernetes RBAC escalation prevention) and bind them to arbitrary identities across the cluster, enabling privilege spreading. Opt-in preserves least-privilege by default.
Option C: Grant ClusterRole permissions by default
Always include
clusterroles/clusterrolebindingsin the operator's ClusterRole and create ClusterRoles for registry API deployments.Tradeoffs: Simpler implementation, but increases the operator's default attack surface. Conflicts with the least-privilege approach that motivated #4030.
Option D: Extend the
disableWorkloadRBACpattern from #4030Lean into the external RBAC management model — document the required ClusterRole/ClusterRoleBinding for multi-namespace registries and let platform teams provision them via GitOps, separate Helm charts, or Kustomize overlays.
Tradeoffs: No operator code changes needed. Puts the burden on platform teams. Works well in GitOps-heavy environments but is a poor experience for simpler deployments.
Additional Considerations
disableWorkloadRBACflag (Add disableWorkloadRBAC flag to skip per-workload RBAC creation #4030, still open) conditionally removes the static ClusterRole/ClusterRoleBinding templates and the operator'sroles/rolebindingspermissions. Any solution here needs to compose correctly with that flag — e.g., if workload RBAC is disabled, cluster-scoped RBAC should also be disabled.escalateverb is granted). For ClusterRoleBindings, a separatebindverb restriction applies (enforced in Kubernetes 1.25+). The operator's ClusterRole must be a superset of any ClusterRole it creates for the registry API.disableWorkloadRBAC, and coexistence with standalone registry chart.References
disableWorkloadRBACflag to skip per-workload RBAC creationcmd/thv-operator/pkg/registryapi/rbac.go— Current namespace-scoped RBAC creationcmd/thv-operator/pkg/kubernetes/rbac/rbac.go— RBAC utility library (namespace-scoped only)cmd/thv-operator/pkg/registryapi/types.go:109—GetServiceAccountName()generates{name}-registry-apideploy/charts/operator/templates/clusterrole/role.yaml— Operator's own ClusterRoledeploy/charts/operator/templates/registry-api-clusterrole.yaml— Static registry API ClusterRole (orphaned)deploy/charts/operator/templates/registry-api-serviceaccount.yaml— Static registry API ServiceAccount (orphaned)