Skip to content

feat(authz): swap existing-gate handlers to RequirePermission (PR 2c)#305

Merged
mcharles-square merged 13 commits into
mainfrom
feat/rbac-pr2c-existing-gate-swap
May 27, 2026
Merged

feat(authz): swap existing-gate handlers to RequirePermission (PR 2c)#305
mcharles-square merged 13 commits into
mainfrom
feat/rbac-pr2c-existing-gate-swap

Conversation

@mcharles-square
Copy link
Copy Markdown
Collaborator

@mcharles-square mcharles-square commented May 22, 2026

Third slice of PR 2 from the granular RBAC plan. Mechanical handler swap plus the privilege-parity check and migration that fell out of code review.

PR 2b (rpc_permissions infra, #303) is merged on main; this PR now targets main directly.

What landed

Handler swaps (RequireAdmin → RequirePermission)

  • buildings/handler.go (5 procedures): site:read for reads, site:manage for writes.
  • sites/handler.go (6 procedures): site:read for ListSites, site:manage for the rest.

Inline-check swaps (drop local helpers)

  • apikey/handler.go (3): drop requireAdmin; gate every endpoint on apikey:manage.
  • fleetnodeadmin/handler.go (4 working procedures): drop requireAdminSession; gate by fleetnode:read (List) or fleetnode:manage (Create/Confirm/Revoke). The four Unimplemented stubs stay in ProceduresPendingMigration — they don't reach a gate today and won't until the methods are actually implemented.
  • serverlog/handler.go: drop requireAdmin; gate by serverlog:read.
  • curtailment/handler.go AdminTerminateEvent: drop the legacy requireAdminFromContext defense-in-depth gate and rely on RequirePermission(PermCurtailmentManage) as the authoritative check. Start/Stop/Preview keep their conditional inline gates pending the broader curtailment authz redesign (PR 2d).

Auth domain layer

  • Delete auth/service.go::checkCanManageUser (sole callers were the Connect handlers; no domain-internal callers).
  • Handler-layer gates added in handlers/auth/handler.go:
    • CreateUser, DeactivateUser, ResetUserPasswordRequirePermission(PermUserManage)
    • ListUsersRequirePermission(PermUserRead). This closes the P0 surfaced in PR 2b's reviewListUsers previously had no role check at all, so any authenticated org member could enumerate users.
  • Classify GetUserRoleName errors: sql.ErrNoRows (target not in caller's org) → InvalidArgument; any other error (transient store failure) → Internal. Avoids treating a backend hiccup as bad input.

ADMIN seed + privilege-parity check

  • ADMIN now seeds with user:read + user:manage; only role:manage is excluded from the formula. role:manage stays SUPER_ADMIN-only because it can grant arbitrary permissions and is the catalog's ultimate authority key.
  • requireCallerCanManageTarget (in auth/service.go) gates every user mutation (CreateUser / ResetUserPassword / DeactivateUser) on two conditions:
    1. Scope-aware subset: every (permission key, scope) the target holds must also be held by the caller. New EffectivePermissions.IsSubsumedBy reuses the narrowing-aware Has(), so a site-scoped grant cannot launder into apparent org-scope authority.
    2. Strict authority: caller's perms must be strictly larger than target's, OR caller must hold org-scope role:manage. Without this, equality (ADMIN target equals ADMIN caller) would make CreateUser a primitive for ADMIN to mint new ADMINs and walk off with the temp password.
  • Resulting semantics:
    • SUPER_ADMIN manages anyone (org-scope role:manage bypass).
    • ADMIN manages strictly-fewer-perms targets (FIELD_TECH, custom non-elevated roles).
    • ADMIN denied on peer ADMIN, on SUPER_ADMIN, on any target whose role holds a permission ADMIN doesn't, and on CreateUser (which assigns ADMIN to the new account — same perm set as the caller).
  • The check is permission-based, not role-name-based: an operator can authorize a custom role for peer management by explicitly granting it role:manage.

Migration: backfill user:read + user:manage to existing ADMIN rows

  • migrations/000060_backfill_admin_user_manage.up.sql. Migration 000053 seeded ADMIN explicitly excluding both keys; the additive-only reconciler does not re-assert seed permissions onto existing role rows, so upgraded orgs would silently deny ADMIN on every newly-gated user-management endpoint without an explicit backfill.
  • INSERT ... ON CONFLICT DO NOTHING, scoped to WHERE builtin_key = 'ADMIN'. Idempotent; operator-created custom roles untouched. Down migration removes both keys from ADMIN.
  • builtin.go docstring updated to point at this migration as the pattern for future seed-formula changes.

rpc_permissions.go bookkeeping

  • 23 procedures move from ProceduresPendingMigrationProcedurePermissions.
  • The PR 2b contract test catches any drift between the map and the actual handler call.

What this PR does NOT do

  • Site-scoped narrowing on site/building handlers. All RequirePermission callers in this PR pass authz.ResourceContext{}, which makes the check org-scoped. EffectivePermissions.Has() does implement narrowing (a site-scope assignment shadows org-scope at that site), so the migrated handlers preserve the pre-existing RequireAdmin (org-wide) semantics rather than introducing narrowing. Site-scope wiring lands in PR 2d, tracked by a TODO at middleware/permission.go calling out the shared siteResourceForMiner / siteResourceForBuilding helper that should accompany the first site-scoped handler migration so building_id → site_id lookups don't get inlined across handlers.
  • New gates on currently-ungated handlers (fleetmanagement, command, deviceset, collection, pools, schedule, errors, telemetry, activity, networkinfo, pairing, foremanimport, curtailment reads). PR 2d.
  • Conditional curtailment gates (Start/Stop/Preview, UpdateCurtailmentEvent stub). PR 2d.
  • Rename user_organization.role_id or add the raising trigger. PR 2d.
  • Delete middleware/admin.go. PR 2d once ProceduresPendingMigration empties.

Test plan

  • go build ./... clean
  • go vet ./... clean
  • go test ./internal/handlers/middleware/... green (contract test still passes; the map matches every swapped handler).
  • go test ./internal/handlers/buildings/... ./internal/handlers/sites/... ./internal/handlers/curtailment/... green after switching handler-level gate tests from role-based to permission-based assertions.
  • go test ./internal/handlers/apikey/... -run TestAPIKeyHandler_SanitizesInternalErrors green after wiring EffectivePermissions through the test's adminAuthInjector.
  • go test ./internal/domain/auth/... green. Privilege-parity helper covered by TestRequireCallerCanManageTarget (super_admin/admin/field_tech/custom-role combos, including the equality block, the site-scope laundering case, and the operator-granted-role:manage-custom-role bypass).
  • go test ./internal/domain/authz/... green. TestEffective_IsSubsumedBy covers scope-aware subset semantics directly. TestReconcile_NewCatalogPermissionDoesNotPropagateToExistingBuiltins confirms operator-revoked permissions still survive restart (additive contract intact); the migration is what propagates seed-formula changes to upgraded orgs.
  • Full go test ./... green.

Rollback

Reverting this PR reinstates RequireAdmin, checkCanManageUser, and the local helper functions. Migration 000060_backfill_admin_user_manage has a .down.sql that removes user:read and user:manage from every ADMIN row; note this also removes the keys from orgs that already held them pre-backfill (provenance isn't tracked, so the down migration over-corrects on first-install orgs — acceptable in dev rollback scenarios but not for prod recovery).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (af8e413a7177df2121ba8d04731f2a1b0197659d...f72cbdcc6030ebaa973cbadaf574945de0af516d, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: NONE

Findings

No findings in the scoped diff.

Notes

Reviewed .git/codex-review.diff only, covering the auth/RBAC changes, handler permission migrations, effective permission comparison logic, and the ADMIN user-management backfill migration. I did not find actionable security, correctness, reliability, cryptostealing/pool hijack, or protobuf/codegen issues in the changed hunks.


Generated by Codex Security Review |
Triggered by: @mcharles-square |
Review workflow run

@mcharles-square mcharles-square force-pushed the feat/rbac-pr2b-rpc-permissions-infra branch from 4c427ca to 6bec15e Compare May 22, 2026 19:47
@mcharles-square mcharles-square force-pushed the feat/rbac-pr2c-existing-gate-swap branch 2 times, most recently from cd8a669 to b730354 Compare May 22, 2026 19:51
@mcharles-square mcharles-square force-pushed the feat/rbac-pr2c-existing-gate-swap branch from ef17a98 to 394d3c1 Compare May 22, 2026 21:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 394d3c1898

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/handlers/sites/handler.go
Base automatically changed from feat/rbac-pr2b-rpc-permissions-infra to main May 26, 2026 16:52
@mcharles-square mcharles-square force-pushed the feat/rbac-pr2c-existing-gate-swap branch from 394d3c1 to 052ecd4 Compare May 26, 2026 18:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 052ecd474d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/handlers/buildings/handler.go
Copilot AI review requested due to automatic review settings May 26, 2026 18:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the server-side granular RBAC migration by replacing role-based “admin” gates with permission-based gates (RequirePermission) across several Connect-RPC handlers, and updates the RPC permission declaration map and associated unit tests to match.

Changes:

  • Swapped multiple handler entry gates from RequireAdmin / inline role checks to middleware.RequirePermission(...) for sites, buildings, API keys, fleet-node admin, server logs, curtailment termination, and auth user-management endpoints.
  • Expanded ProcedurePermissions in rpc_permissions.go and reduced ProceduresPendingMigration accordingly.
  • Updated/added handler-layer tests, including a shared handlerstest.CtxWithPermissions helper to construct contexts with EffectivePermissions.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
server/internal/handlers/sites/handler.go Swap site RPC gates to RequirePermission (site:read/manage).
server/internal/handlers/sites/handler_test.go Update tests to inject permission-based contexts and assert permission gating.
server/internal/handlers/serverlog/handler.go Replace inline role check with RequirePermission(serverlog:read).
server/internal/handlers/middleware/rpc_permissions.go Move migrated procedures into ProcedurePermissions; update migration bookkeeping.
server/internal/handlers/handlerstest/permissions.go New shared helper for building permission-bearing request contexts in handler tests.
server/internal/handlers/fleetnodeadmin/handler.go Replace inline admin-session gate with RequirePermission(fleetnode:read/manage).
server/internal/handlers/curtailment/handler.go Gate AdminTerminateEvent with RequirePermission(curtailment:manage).
server/internal/handlers/curtailment/handler_test.go Update terminate-event gate test from role-based to permission-based.
server/internal/handlers/buildings/handler.go Swap building RPC gates to RequirePermission (site:read/manage).
server/internal/handlers/buildings/handler_test.go Update tests to inject permission-based contexts and assert permission gating.
server/internal/handlers/auth/handler.go Add handler-layer permission gates for user-management RPCs.
server/internal/handlers/auth/handler_gate_test.go New fast-path tests asserting user-management RPCs deny without required permissions.
server/internal/handlers/apikey/handler.go Replace local admin helper with RequirePermission(apikey:manage).
server/internal/handlers/apikey/handler_error_sanitization_test.go Update interceptor test injector to provide EffectivePermissions.
server/internal/domain/authz/reconcile_test.go Update admin seed expectations around user/role permissions.
server/internal/domain/authz/builtin.go Update ADMIN seed-permission documentation and exclusions.
server/internal/domain/auth/service.go Remove checkCanManageUser; add role-hierarchy check for user mutations and cross-org guards.
server/internal/domain/auth/service_test.go Add unit test coverage for the new role-hierarchy helper.

Comment thread server/internal/handlers/sites/handler.go
Comment thread server/internal/handlers/sites/handler.go
Comment thread server/internal/handlers/buildings/handler.go
Comment thread server/internal/handlers/buildings/handler.go
Comment thread server/internal/domain/authz/reconcile_test.go
Comment thread server/internal/handlers/buildings/handler.go
Comment thread server/internal/handlers/buildings/handler.go
Comment thread server/internal/handlers/buildings/handler.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e4bba4dd1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/domain/auth/service.go Outdated
mcharles-square added a commit that referenced this pull request May 26, 2026
Addresses PR review feedback (#305) — Codex P2.

ResetUserPassword and DeactivateUser previously mapped every error
from GetUserRoleName to InvalidArgument("invalid user_id"). A transient
DB failure (timeout, connection drop, etc.) is not bad input; the
caller should retry, not assume they typed the wrong ID.

Distinguish: sql.ErrNoRows → InvalidArgument (target not in caller's
org, preserves the existence-leak guard). Any other error → Internal,
matching the rest of the codebase's error-classification pattern.
@mcharles-square
Copy link
Copy Markdown
Collaborator Author

Re: Codex Security Review

Both HIGH findings (site mutations + building handlers using authz.ResourceContext{}) were addressed on the inline threads. Summary:

Declined as intentional: the handlers migrated in this PR (sites, buildings, apikey, fleetnodeadmin, serverlog, curtailment Admin*, auth user-management) were previously gated by RequireAdmin (org-wide). The swap to RequirePermission preserves org-scope semantics by design. Site-scope narrowing on site-targeted handlers is deferred to PR 2d, captured by a TODO at server/internal/handlers/middleware/permission.go for the shared siteResourceForMiner/siteResourceForBuilding-style helper that should land with the first site-scoped migration so the building_id → site_id lookup doesn't get inlined across handlers.

Also worth noting: today's authz model is additive grants (no "deny-at-site"); the "org grant bypasses site-scoped restriction" scenario only matters when a user holds only site-scoped grants without an org-scope grant, which doesn't apply to org admins. The PR description's "What this PR does NOT do" section now documents the deferral explicitly.

mcharles-square added a commit that referenced this pull request May 26, 2026
Addresses PR review feedback (#305) — Codex P2.

ResetUserPassword and DeactivateUser previously mapped every error
from GetUserRoleName to InvalidArgument("invalid user_id"). A transient
DB failure (timeout, connection drop, etc.) is not bad input; the
caller should retry, not assume they typed the wrong ID.

Distinguish: sql.ErrNoRows → InvalidArgument (target not in caller's
org, preserves the existence-leak guard). Any other error → Internal,
matching the rest of the codebase's error-classification pattern.
@mcharles-square mcharles-square force-pushed the feat/rbac-pr2c-existing-gate-swap branch from ab78311 to a4fe6f9 Compare May 26, 2026 19:48
mcharles-square added a commit that referenced this pull request May 26, 2026
…tion

Addresses PR review feedback (#305) — Codex security review.

**Finding 1 — custom-role hijack via password reset (HIGH).**
The prior role-name-based hierarchy check treated only ADMIN /
SUPER_ADMIN as elevated. An ADMIN with user:manage could reset the
password of a user whose custom role grants role:manage (or any other
sensitive permission), log in, and escalate.

Replace requireCanManageTargetRole(string, string) with
requireCallerCanManageTarget(callerKeys, targetKeys []string): caller's
effective permissions must be a superset of target's. Role names are no
longer consulted; operator-created custom roles get the right treatment
automatically.

- Plumb *authz.PermissionResolver through auth.NewService so the
  service can load the target's effective permissions (and the
  caller's) independent of the request middleware.
- Add UserManagementStore.ListPermissionKeysByRoleID for CreateUser,
  which checks against the role being assigned (currently always ADMIN).
- TestRequireCallerCanManageTarget covers super-admin/admin/field-tech/
  custom-role combinations including the escalation primitive
  (custom role with role:manage).

**Finding 3 — ADMIN user:manage backfill on upgrade (MEDIUM).**
The additive-only reconciler is correct: it preserves operator-revoked
permissions across restarts (covered by
TestReconcile_OperatorRevokedSensitivePermStaysRevokedOnRestart).
Re-asserting the seed every boot would silently restore deliberate
revocations — that's the wrong fix.

Ship a one-off migration (000060_backfill_admin_user_manage) that
inserts user:read and user:manage into every existing ADMIN role row.
The INSERT ... ON CONFLICT DO NOTHING makes it idempotent and safe on
orgs already holding either key. Custom roles aren't touched
(WHERE builtin_key = 'ADMIN').

Update builtin.go to point at the migration for future seed-formula
changes that need to propagate to upgraded orgs.
@mcharles-square
Copy link
Copy Markdown
Collaborator Author

Follow-up to the security review

Three findings re-assessed; pushed 0cf6a71 with code changes for #1 and #3 and a correction on #2.

Finding 1 (HIGH custom-role hijack) — fixed

Replaced the role-name-based hierarchy check (isElevatedRole(ADMIN|SUPER_ADMIN)) with a privilege-parity check on effective permissions: caller's permission set must be a superset of the target's. An ADMIN with user:manage can no longer reset/deactivate a user whose custom role grants role:manage (or any other permission the caller doesn't hold). requireCallerCanManageTarget(callerKeys, targetKeys) operates on the permission keys directly — role names are no longer consulted. Test coverage in TestRequireCallerCanManageTarget includes the escalation primitive (custom role with role:manage).

*authz.PermissionResolver is now plumbed into auth.NewService so the service can load the target's effective permissions independently of request middleware. UserManagementStore.ListPermissionKeysByRoleID covers the CreateUser path, where the new user gets the ADMIN role's permission set.

Finding 3 (MEDIUM ADMIN backfill) — fixed via migration

Considered changing ReconcileAdditive to re-assert seed permissions on every boot, but that breaks TestReconcile_OperatorRevokedSensitivePermStaysRevokedOnRestart — the additive contract intentionally preserves operator revocations. Re-asserting would silently restore permissions an operator deliberately removed.

Shipping a one-off backfill migration instead: migrations/000060_backfill_admin_user_manage.up.sql inserts user:read and user:manage into every existing ADMIN role row. ON CONFLICT DO NOTHING makes it idempotent; WHERE builtin_key = 'ADMIN' leaves operator-created custom roles untouched. Updated builtin.go to document the pattern for future seed-formula changes.

Finding 2 (HIGH site/building scope) — correction posted

Acknowledged on each declined thread: my earlier rationale ("the model is additive grants, no deny-at-site") was wrong. effective.go:102-113 does implement narrowing — passing ResourceContext{} on a site-targeted op genuinely bypasses it. The migration decision still stands (PR 2c preserves the pre-existing RequireAdmin org-wide semantics; site-scoped handlers + shared helper land in PR 2d, tracked by the TODO at middleware/permission.go), but the reasoning on those threads is now correct.

cc the inline reviewers.

… ListUsers gap

Migrates the handlers that already had an admin gate to the catalog-permission
model introduced in PR 2a. Behavior preserved for the existing gates; the
ListUsers handler picks up a real gate for the first time (PR 2b's review
flagged this as a P0 — any authenticated org member could enumerate users).

Swaps:
- buildings/handler.go (5 procedures): RequireAdmin -> RequirePermission with
  PermSiteRead for reads, PermSiteManage for writes.
- sites/handler.go (6 procedures): RequireAdmin -> RequirePermission with
  PermSiteRead for ListSites, PermSiteManage for mutations.
- apikey/handler.go (3 procedures): drop the local requireAdmin helper, gate
  every endpoint on PermAPIKeyManage.
- fleetnodeadmin/handler.go (4 working procedures): drop the local
  requireAdminSession helper, gate by PermFleetnodeRead (List) or
  PermFleetnodeManage (Create/Confirm/Revoke). The four Unimplemented stubs
  stay in ProceduresPendingMigration.
- serverlog/handler.go: drop the local requireAdmin helper, gate by
  PermServerlogRead.
- curtailment/handler.go AdminTerminateEvent: swap to
  RequirePermission(PermCurtailmentManage). Start/Stop/Preview keep their
  conditional inline gates pending the broader curtailment authz redesign.
- auth/service.go: delete checkCanManageUser (only caller was the connect
  handler). Gate moves to the handler layer:
  - CreateUser/DeactivateUser/ResetUserPassword: PermUserManage.
  - ListUsers: PermUserRead (closes the P0 from PR 2b review).

rpc_permissions.go now reflects the swap: 23 procedures move from
ProceduresPendingMigration to ProcedurePermissions; the contract test
catches drift between the map and the actual handler call.

Tests:
- Handler-level gate tests in buildings, sites, and curtailment switched
  from role-based assertions ("VIEWER", "ADMIN", "SUPER_ADMIN") to
  permission-set assertions wired through middleware.WithEffectivePermissions.
- The adminAuthInjector in the apikey sanitization test now stuffs an
  org-scope EffectivePermissions value carrying PermAPIKeyManage onto the
  request context.
… harden tests

Code review of PR 2c surfaced one P1 security issue and several testing /
maintainability gaps that warranted closing in the same PR:

1. **Cross-org IDOR (security P1):** with checkCanManageUser deleted, the
   service-layer methods resolved the target user via GetUserByExternalID
   (a global lookup) without verifying the target belongs to the caller's
   org. A SUPER_ADMIN in org A who learned an external user_id from org B
   could reset that user's password or deactivate them. Add a
   GetUserRoleName(target, callerOrg) probe in ResetUserPassword and
   DeactivateUser; an error from the probe means the target isn't in the
   caller's org and we return InvalidArgument without leaking existence.

2. **ADMIN locked out of /settings/team (P1 regression):** the new ListUsers
   gate uses PermUserRead, which the previous ADMIN seed formula excluded
   alongside PermUserManage. Result: any default ADMIN user opening the
   Team page got PermissionDenied. Drop user:read from the exclusion set
   so ADMIN holds user:read (view team) but still lacks user:manage
   (mutate users). Reconcile test updated to assert the new shape.

3. **Test helper duplication (P1 maintainability):** ctxWithPermissions
   was copy-pasted between buildings/sites tests. Extract to a new
   internal/handlers/handlerstest package; both call sites now import it.

4. **No fast-path unit tests for the auth gates (P1 testing):** the four
   new RequirePermission gates on Create/Deactivate/ResetPassword/List
   only had DB-backed integration coverage. Add a
   handler_gate_test.go that wires EffectivePermissions through the
   shared handlerstest helper and asserts PermissionDenied on the deny
   path for each gate.

Safe-auto polish applied in the same commit:
- Stale '(Super Admin only)' docstrings on auth service + handler methods
  replaced with notes pointing at the handler-layer RequirePermission gate.
- adminCtx test helper renamed to sitePermsCtx since it now wires
  permission keys, not a role.
- Stale 'requireAdminFromContext' attribution in curtailment validation-
  test comment updated to point at middleware.RequirePermission.
- Dead Role field removed from apikey sanitization-test's
  adminAuthInjector now that gate reads EffectivePermissions, not the
  role string.

Deferred to follow-up:
- Empty ResourceContext{} on every swapped call site (latent gap for
  site-scoped assignments).
- Curtailment Start/Stop/Preview partial-revocation cascade — PR 2d.
- TestHandler_sitePermissionsPassGate PermSiteManage subcase always
  also wires PermSiteRead; the case proves nothing about manage alone.
- buildings authGate denial-loop only exercises List + Create (Get,
  Update, Delete not covered in the gate test).
…in layer

ADMIN previously could not create, reset, or deactivate any user — the
seed formula excluded user:manage outright. The intended design is for
ADMIN to manage non-elevated users (custom roles, FIELD_TECH) while
ADMIN and SUPER_ADMIN targets stay SUPER_ADMIN-only.

Wire that up:

- adminSeedPermissions(): only exclude role:manage. ADMIN now holds
  user:manage at the permission level.
- requireCanManageTargetRole(): pure helper enforcing the hierarchy —
  a non-SUPER_ADMIN caller cannot mutate an ADMIN or SUPER_ADMIN
  target. SUPER_ADMIN is unrestricted.
- CreateUser: gate caller against AdminRoleName (the role assigned to
  every newly-created user today). A non-SUPER_ADMIN caller is blocked
  until the proto grows a role field.
- ResetUserPassword / DeactivateUser: reuse the existing cross-org
  GetUserRoleName probe as the target-role input — no extra DB
  roundtrip — and then check the caller's role.

role:manage stays SUPER_ADMIN-only because role edits can grant
arbitrary permissions and have no comparable per-target hierarchy.

Tests:
- TestRequireCanManageTargetRole: 10 cases covering each caller/target
  combination including custom roles.
- TestReconcile_FreshInstall_AdminExcludesRoleManagement: renamed +
  asserts ADMIN now holds user:manage.
Every current RequirePermission caller passes an empty ResourceContext
because the migrated handlers are all org-scoped. Flag the first
site-scoped migration as the right moment to extract a shared helper
(e.g. siteResourceForMiner) instead of inlining the miner_id → site_id
lookup at each callsite.
Addresses PR review feedback (#305) — Codex P2.

ResetUserPassword and DeactivateUser previously mapped every error
from GetUserRoleName to InvalidArgument("invalid user_id"). A transient
DB failure (timeout, connection drop, etc.) is not bad input; the
caller should retry, not assume they typed the wrong ID.

Distinguish: sql.ErrNoRows → InvalidArgument (target not in caller's
org, preserves the existence-leak guard). Any other error → Internal,
matching the rest of the codebase's error-classification pattern.
Rebase onto main picked up #308 / #311, which added a defense-in-depth
requireAdminFromContext call ahead of the curtailment:manage gate.
PR 2c's design moves RBAC to the authoritative check, so drop the
legacy role gate and reorder so RequirePermission runs before the
nil-service check (auth-before-implementation; preserves Unauthenticated
on missing session instead of leaking Unimplemented).

The TestHandler_AdminTerminateEvent_RejectsNonAdmin case asserted the
legacy role gate's defense-in-depth behavior — it's redundant with the
renamed TestHandler_AdminTerminateEvent_RejectsCallerWithoutCurtailmentManage
which already covers the RBAC denial path. Drop the obsolete test.
…tion

Addresses PR review feedback (#305) — Codex security review.

**Finding 1 — custom-role hijack via password reset (HIGH).**
The prior role-name-based hierarchy check treated only ADMIN /
SUPER_ADMIN as elevated. An ADMIN with user:manage could reset the
password of a user whose custom role grants role:manage (or any other
sensitive permission), log in, and escalate.

Replace requireCanManageTargetRole(string, string) with
requireCallerCanManageTarget(callerKeys, targetKeys []string): caller's
effective permissions must be a superset of target's. Role names are no
longer consulted; operator-created custom roles get the right treatment
automatically.

- Plumb *authz.PermissionResolver through auth.NewService so the
  service can load the target's effective permissions (and the
  caller's) independent of the request middleware.
- Add UserManagementStore.ListPermissionKeysByRoleID for CreateUser,
  which checks against the role being assigned (currently always ADMIN).
- TestRequireCallerCanManageTarget covers super-admin/admin/field-tech/
  custom-role combinations including the escalation primitive
  (custom role with role:manage).

**Finding 3 — ADMIN user:manage backfill on upgrade (MEDIUM).**
The additive-only reconciler is correct: it preserves operator-revoked
permissions across restarts (covered by
TestReconcile_OperatorRevokedSensitivePermStaysRevokedOnRestart).
Re-asserting the seed every boot would silently restore deliberate
revocations — that's the wrong fix.

Ship a one-off migration (000060_backfill_admin_user_manage) that
inserts user:read and user:manage into every existing ADMIN role row.
The INSERT ... ON CONFLICT DO NOTHING makes it idempotent and safe on
orgs already holding either key. Custom roles aren't touched
(WHERE builtin_key = 'ADMIN').

Update builtin.go to point at the migration for future seed-formula
changes that need to propagate to upgraded orgs.
@mcharles-square mcharles-square force-pushed the feat/rbac-pr2c-existing-gate-swap branch from 0cf6a71 to eb7be2c Compare May 27, 2026 05:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb7be2c248

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/domain/auth/service.go Outdated
Addresses PR review feedback (#305) — Codex P1.

The prior parity check compared callerEff.FlatKeys() against
targetEff.FlatKeys(), which collapses site-scope and org-scope grants
into one set. That lets a caller launder a site-scoped permission
(e.g. role:manage at one site via a custom assignment) into apparent
authority over an org-scoped target — exactly the escalation primitive
the check was meant to block.

Replace the flat-key comparison with a scope-aware predicate:

- New EffectivePermissions.IsSubsumedBy(other): for every (key, scope)
  the receiver holds, other.Has(key, that scope) must be true. Reuses
  the existing narrowing-aware Has(), so site-scope narrowing applies
  correctly when the caller has any site-scope assignment at that site.
- requireCallerCanManageTarget now takes *EffectivePermissions on both
  sides and delegates to IsSubsumedBy.
- ResetUserPassword / DeactivateUser load both caller and target via
  permResolver.LoadEffective (no more FlatKeys flattening).
- CreateUser synthesizes the target as a single org-scope Assignment
  built from the assigned role's permissions — the new user holds them
  org-wide, so the synthetic snapshot matches reality.

Tests:
- TestEffective_IsSubsumedBy covers the org-vs-site mismatch cases
  directly (including the reviewer's example: caller with site-scoped
  role:manage must NOT subsume a target with org-scoped role:manage).
- TestRequireCallerCanManageTarget refactored to use real Assignment
  values; adds the laundering case + a site-scoped-target case.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d7ea29058

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/domain/auth/service.go Outdated
Addresses PR review feedback (#305) — Codex P1.

After granting ADMIN user:manage, the privilege-parity check passed by
equality for ADMIN ↔ ADMIN: target's permission set equals caller's, so
"target ⊆ caller" trivially held. That turned CreateUser into a way
for any ADMIN to create another ADMIN and walk off with its temp
password, and let ADMIN reset / deactivate peer ADMINs — both contrary
to the design intent that ADMIN/SUPER_ADMIN mutations stay reserved
for SUPER_ADMIN.

Tighten the check: subset still required, plus the caller must hold
org-scope role:manage OR have a *strictly* larger permission set than
the target. Permission-based — no role-name strings — so an operator
can authorize a custom role for peer management by granting it
role:manage explicitly.

Resulting semantics:
- SUPER_ADMIN manages anyone (role:manage bypass).
- ADMIN manages strictly-fewer-perms targets (e.g. FIELD_TECH, custom).
- ADMIN BLOCKED from peer ADMIN, from CreateUser (which assigns ADMIN
  to the new account), and from any target holding a permission ADMIN
  doesn't (e.g. a custom role with role:manage).

builtin.go docstring updated to explain why role:manage is excluded
from the ADMIN seed (it's the peer-management bypass).
Comment thread server/internal/domain/auth/service.go Outdated
Comment thread server/internal/domain/auth/service.go Outdated
Addresses PR review feedback (#305) from @ankitgoswami.

- requireCallerCanManageTarget docstring: collapse 20-line two-bullet
  block into a single sentence that still names the bypass rule and
  why equality without it is dangerous.
- Drop "Privilege parity:" inline preambles at the three call sites
  (CreateUser / ResetUserPassword / DeactivateUser) — they restated
  the helper's docstring. Function name carries the meaning.
- Tighten the cross-org guard comment in ResetUserPassword /
  DeactivateUser to one short paragraph naming the existence-leak
  reason and the Internal-vs-InvalidArgument split.

No behavior change.
Comment thread server/internal/domain/auth/service.go
Addresses PR review feedback (#305) from @ankitgoswami.

ResetUserPassword and DeactivateUser duplicated the same five-step
flow: GetUserByExternalID → cross-org guard via GetUserRoleName →
load target/caller effective permissions → requireCallerCanManageTarget.
CreateUser had an analogous but distinct shape (synthesize target from
the assigned role's permissions). Extract two service methods so each
handler reads as one named step plus its actual business logic.

- authorizeCallerForUser(ctx, callerUserID, orgID, targetExternalID)
  → (stores.User, error): the shared path. Returns the resolved
  target so callers can use its IDs / username downstream.
- authorizeCallerForNewUserWithRole(ctx, callerUserID, orgID, roleName)
  → (stores.Role, error): the CreateUser variant. Returns the resolved
  role so the caller can bind the new user to it.

Net: 148 lines shifted, −76 / +72; ResetUserPassword and DeactivateUser
each lose ~25 lines of duplicated lookup + parity-check scaffolding.
No behavior change; existing tests cover the paths.
Addresses PR review feedback (#305) from @ankitgoswami.

The privilege-parity check expressed "subset AND (role:manage OR
strict subset)" via three sequential conditionals reading
IsSubsumedBy twice in opposite directions. Extracting
StrictlyDominates(other) — defined as other.IsSubsumedBy(e) &&
!e.IsSubsumedBy(other) — turns the helper into two readable branches:
role:manage holders need subsumes-only, everyone else needs strict
dominance.

No behavior change; existing TestRequireCallerCanManageTarget cases
all still pass. Added TestEffective_StrictlyDominates to cover the
predicate directly (proper superset / subset / equality).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8baf1a7889

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/domain/authz/effective.go
Addresses PR review feedback (#305) — Codex P1.

The prior implementation iterated the receiver's keys and asked
other.Has(key, scope) at each one. For receiver org-scope keys it only
asked other.Has(key, ResourceContext{}), which ignored sites where
other has a site-scope assignment that narrows away the key. Result:
a caller with org-scope SUPER_ADMIN plus a restrictive FIELD_TECH
assignment at site 7 would falsely subsume an org ADMIN target whose
ADMIN authority is still live at site 7 (no narrowing on the target
side). ResetUserPassword would then hand the caller credentials for an
account with broader site-7 authority than the caller itself wields.

Two-domain check:

  1. Org-scope: every org-scope key the receiver holds must be in
     other's org-scope (unchanged).

  2. Site-scope: at every site where either party has a narrowing
     assignment, verify each key the receiver can perform there is
     also performable by other at that site. "Each key the receiver
     can perform at site s" = receiver.bySite[s] if it has one
     (narrowing applies), otherwise receiver.orgScope (org-scope
     applies). Uses the narrowing-aware Has() on the other side, so
     fall-through to org-scope is handled automatically.

Sites where neither party narrows remain covered by the org-scope pass
alone — there's no per-site differentiation to detect.

Tests:
- TestEffective_IsSubsumedBy: new case for the caller-narrowing
  scenario (broad org-scope caller with site-7 narrowing must NOT
  subsume an org-scope target whose authority is live at site 7).
- TestRequireCallerCanManageTarget: end-to-end case mirroring the
  reviewer's exact framing (org-scope SUPER_ADMIN + FIELD_TECH site-7
  narrowing → cannot reset an org-ADMIN target's password).
@mcharles-square
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mcharles-square mcharles-square merged commit ba6177e into main May 27, 2026
70 checks passed
@mcharles-square mcharles-square deleted the feat/rbac-pr2c-existing-gate-swap branch May 27, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants