Skip to content

Leaky permissions: principal lifecycle events do not consistently revoke access #1585

@AmanGIT07

Description

@AmanGIT07

Summary

Frontier supports several principal types (User, ServiceUser, PAT, Session) and several lifecycle events that should result in revoked access (Delete, Disable, expiry). The matrix below shows where revocation
works correctly today and where it does not. Multiple cells fail — credentials continue to authenticate after the operation that was supposed to stop them.

This issue is a tracker. The actual work is split into per-fix sub-issues to keep PRs reviewable. Most of the bug surface collapses into a small number of changes (an auth-layer gate plus a deleter cleanup); the rest are
hygiene/defense-in-depth follow-ups.

Current behavior matrix

For each row: does the principal still authenticate after this event?

Event User session PAT ServiceUser
User deleted No (FK CASCADE on sessions.user_id) No (FK CASCADE on user_pats.user_id) n/a
User disabled No (UserRepository.GetByID filters notDisabledUserExp) No (same user filter) n/a
Org deleted n/a (sessions not org-scoped) No (FK CASCADE on user_pats.org_id) Yes — bug
Org disabled n/a Yes — bug Yes — bug

The user-disabled column is correctly gated by a data-layer filter (notDisabledUserExp in internal/store/postgres/user_repository.go:50). It's a useful design template for what the org-state gate should look like.

Sub-issues

Bugs (active leaks)

  • #TBD — Auth path does not check parent org state (PAT + ServiceUser)
  • #TBD — DeleteOrganization leaves service users orphaned

Future enhancements

  • #TBD — Actively revoke sessions on user disable
  • #TBD — Plumb serviceuser.state column end-to-end (admin RPCs + auth check)
  • #TBD — Add FK with ON DELETE RESTRICT on serviceusers.org_id

Suggested order

  1. Auth-layer org-state check — single file, single PR, closes the entire org-disabled column for PAT and SU. Also serves as a safety net for orphaned SUs while the deleter fix is being prepared.
  2. Deleter SU cleanup — independent of (1); together they fix all three bug cells in the matrix.
  3. Session revocation on disable — fully independent hygiene win.
  4. serviceuser.state plumbing — new admin RPCs and audit events; coordinate with frontend if it surfaces in any UI.
  5. FK + RESTRICT on serviceusers.org_id — must land after (2) is in production long enough that no new orphans are being created, with a one-time backfill migration first.

Hard dependency: (5) → (2). Everything else can be parallelized.

Acceptance criteria for the epic

  • PAT scoped to a disabled org cannot authenticate.
  • ServiceUser belonging to a disabled org cannot authenticate.
  • After DeleteOrganization, no serviceusers rows reference the deleted org; credentials and SpiceDB relations are gone.
  • Behavior matrix above is fully "No" or n/a in every cell.
  • Admin can disable an individual ServiceUser; auth refuses disabled SUs.
  • serviceusers.org_id has FK with ON DELETE RESTRICT; future broken deleters fail loud.
  • Disabling a user removes their session rows; audit log records the revocation.

Sub-issue 1 — Auth path does not check parent org state (PAT + ServiceUser)

Title: Auth path does not check parent org state — disabled-org PAT and ServiceUser credentials still authenticate

Summary

DisableOrganization flips state = 'disabled' but does not invalidate credentials owned by that org. The authenticator paths for PAT and ServiceUser fetch the principal directly without consulting organizations.state, so
disabling an org has no effect on its outstanding credentials.

Steps to reproduce — PAT

  1. Create org-A and a user with a PAT scoped to org-A.
  2. Verify the PAT authenticates (returns 200).
  3. DisableOrganization(org-A).
  4. Re-attempt with the same PAT → returns 200. Expected: Unauthenticated.

Steps to reproduce — ServiceUser

  1. Create org-A and su-1 with a credential.
  2. Verify auth succeeds.
  3. Disable org-A.
  4. Re-attempt → returns 200. Expected: Unauthenticated.

Where it happens

  • core/authenticate/authenticators.go:58authenticateWithPAT validates the PAT and fetches the owning user; no org-state check.
  • core/authenticate/authenticators.go:198, :178authenticateWithClientCredentials and authenticateWithJWTGrant fetch the SU and return; no org-state check.
  • orgService.Get (core/organization/service.go:146) already returns ErrDisabled for disabled orgs and is honored by most non-auth RPCs. The auth path is the only hole.

Direction

After each PAT and ServiceUser fetch in the authenticator, gate on the parent org's state by calling orgService.Get(orgID). Translate ErrDisabled and ErrNotExist into ErrUnauthenticated. ErrNotExist also defends
against orphaned SUs from the parallel deleter bug.

Place the check at the auth boundary (the authenticator), not inside serviceUserService.Get / userPATService.GetByID. Other callers (admin RPCs, list views) may legitimately need to surface principals from disabled orgs.

Acceptance criteria

  • PAT scoped to a disabled org → ErrUnauthenticated.
  • ServiceUser belonging to a disabled org → ErrUnauthenticated.
  • PAT or SU whose org row no longer exists → ErrUnauthenticated.
  • Existing tests pass; new tests cover both disabled-org and missing-org cases for PAT and SU.
  • Non-auth callers of serviceUserService.Get / userPATService.GetByID are unaffected.

Sub-issue 2 — DeleteOrganization leaves service users orphaned

Title: DeleteOrganization does not delete owned service users; orphaned rows continue to authenticate

Summary

When an organization is deleted, its service users are left behind in serviceusers with org_id pointing at a non-existent row. Their credentials remain valid and the auth path returns a successful Principal for them, so
they continue to make authenticated requests after the parent org is gone.

Steps to reproduce

  1. Create org-A and su-1 under it; mint a credential for su-1.
  2. Verify su-1 authenticates.
  3. DeleteOrganization(org-A).
  4. Re-attempt with su-1's credential → returns 200. Expected: Unauthenticated.

After the delete:

  • serviceusers row for su-1 is intact; org_id references a deleted UUID.
  • serviceuser_credentials rows are intact.

Where it happens

  • Deleter: core/deleter/service.go:159DeleteOrganization walks policies → projects → groups → roles → invitations → billing → orgService.DeleteModel. No service user cleanup step.
  • Schema: internal/store/postgres/migrations/20230507191406_create_base_tables.up.sql:60serviceusers.org_id uuid declared with no FK, so the database does not refuse the delete or cascade.
  • Auth: core/serviceuser/service.go:324, :357GetBySecret / GetByJWT end with repo.GetByID, no relationship awareness.

Direction

Add a service user cleanup loop in core/deleter/service.go::DeleteOrganization alongside the existing entity cleanup loops. List SUs by org and call serviceUserService.Delete(ctx, su.ID) per SU — the SU service's Delete
already removes credentials, org-membership policies, and SpiceDB tuples (core/serviceuser/service.go:142). Inject serviceUserService into the deleter's dependencies.

A complementary auth-layer org-state check (sub-issue #1) provides defense in depth; together they close the row.

Acceptance criteria

  • After DeleteOrganization, all serviceusers rows for that org are gone.
  • Their serviceuser_credentials rows are gone.
  • Their SpiceDB tuples are gone.
  • Tests cover the cleanup path (deleter unit test) and the end-to-end auth-fails-after-delete path.
  • No regression in other deleter cascade steps.

Note

A DB-level FK + ON DELETE RESTRICT is intentionally not part of this PR — tracked as a separate enhancement (sub-issue #FK). This PR ships without a backfill migration so it can land quickly.


Sub-issue 3 — Actively revoke sessions on user disable (enhancement)

Title: User disable should actively revoke sessions instead of relying on lazy auth-time filtering

Context

Sessions for disabled users do fail authentication today: UserRepository.GetByID filters by notDisabledUserExp (internal/store/postgres/user_repository.go:50) and the auth chain treats the resulting ErrNotExist as a
failure. So this is not a security bug.

But the session rows linger up to 7 days until expires_at. If a user is disabled then re-enabled within that window, their pre-disable session resumes — implicit behavior that falls out of the lazy filter rather than being
designed. There is also no audit signal recording the effective session invalidation.

Direction

In core/user/service.go::Disable, after the state flip succeeds, call a new sessionService.DeleteByUserID(ctx, id) to remove session rows for that user. Emit a UserSessionsRevokedEvent (or include the count in
UserDisabledEvent).

Re-enabled users will then need to log in again — the standard expectation after an account suspension.

Acceptance criteria

  • After Disable(userID), no sessions rows remain for that user.
  • Audit event recorded with the revocation.
  • Test covers disable → previously-valid session cookie returns Unauthenticated → re-enable → session cookie still returns Unauthenticated until fresh login.

Out of scope

  • A general session-cleanup cron for expired rows.
  • Cross-cutting "revoke all credentials" admin tool.

Sub-issue 4 — Plumb serviceuser.state column end-to-end (enhancement)

Title: serviceuser.state column is unused; add admin RPCs to enable/disable individual service users and enforce in auth path

Context

serviceusers.state exists with default 'enabled' (internal/store/postgres/migrations/20230507191406_create_base_tables.up.sql:65). The Disabled State = "disabled" constant is defined in
core/serviceuser/serviceuser.go:22. The repo selects the column. Nothing in the codebase ever writes a non-default value — no Service.Disable, no admin RPC, no migration. The column is dead weight.

Operators currently have only "delete the SU" or "delete/disable the parent org" as access-revocation primitives. A per-SU disable is a useful primitive for incident response (compromised credential), staged onboarding, and
parity with how human users are managed.

Direction

  • Add Disable(ctx, id) and Enable(ctx, id) methods on core/serviceuser/service.go, backed by a new SetState repo method.
  • Add admin RPCs (DisableServiceUser, EnableServiceUser) on FrontierService.
  • In the auth path (alongside the org-state check from sub-issue feat: add username and groupname columns #1), reject when su.State == Disabled.
  • Emit audit events for state transitions.

Acceptance criteria

  • Admin RPC can disable / enable a specific SU.
  • Disabled SU → ErrUnauthenticated from any credential.
  • Enabling a disabled SU restores auth (without minting new credentials).
  • Audit events recorded for both transitions.
  • Service-level unit tests, auth-level test, admin-RPC integration test.

Out of scope


Sub-issue 5 — Add FK with ON DELETE RESTRICT on serviceusers.org_id (enhancement)

Title: Add foreign key with ON DELETE RESTRICT on serviceusers.org_id (DB-level safety net)

Depends on: #DELETER (sub-issue #2 — must be in production first)

Context

serviceusers.org_id uuid has no FK constraint at all (internal/store/postgres/migrations/20230507191406_create_base_tables.up.sql:62). The application-level cleanup is being added in sub-issue #2, but the database has no
fallback. If a future change to the deleter regresses the SU cleanup, or an out-of-band deletion occurs, there is no DB-level guard.

Why RESTRICT, not CASCADE

A SQL CASCADE would silently delete serviceusers rows but leave SpiceDB tuples and credential metadata untouched (cleanup of those lives in app code, beyond the DB's reach). That recreates exactly the half-applied-state
shape we are closing. RESTRICT instead forces any future regression in the deleter to fail loudly with ErrForeignKeyViolation.

Direction

Two-step migration:

  1. Backfill / reconciliation — identify and clean up any pre-existing orphaned rows. Either delete them after operator review or write a reconciliation script. Must run before the FK is added; otherwise the constraint
    creation will fail.
  2. ALTER TABLE adding the FK:
ALTER TABLE serviceusers                                                                                                                                                                                                         
  ALTER COLUMN org_id SET NOT NULL,                                                                                                                                                                                            
  ADD CONSTRAINT serviceusers_org_id_fkey                                                                                                                                                                                        
    FOREIGN KEY (org_id) REFERENCES organizations(id) ON DELETE RESTRICT;
                                                                                                                                                                                                                                 
Acceptance criteria                                                                                                                                                                                                            
                                                                                                                                                                                                                                 
- Backfill identifies and removes (or reattaches) all orphaned serviceusers rows.                                                                                                                                                
- FK constraint is in place; verified via \d serviceusers.
- Manual test: attempt to delete an org with active SUs → fails with FK violation; with SUs cleaned up → succeeds.                                                                                                               
                                                                                                                                                                                                                                 
Note                                                                                                                                                                                                                             
                                                                                                                                                                                                                                 
This sub-issue does not change application logic. It is intentionally landed last so that the deleter fix has been in production long enough that no new orphans are being created.                                              
 

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions