feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag by dm36 · Pull Request #248 · scaleapi/scale-agentex

dm36 · 2026-05-26T20:24:20Z

Summary

Wires register_resource / deregister_resource into AgentAPIKeysUseCase.create / delete so api_keys get a SpiceDB tuple (with parent_agent edge) on create and have that tuple removed on delete. Companion to the route-migration PR (#252).

scale-agentex is intentionally decoupled from egp-api-backend. The dual-write call is unconditional from this repo. agentex-auth's per-account routing (scaleapi/agentex#353, FGAC_AGENTEX_AUTH_SPARK) decides whether the call lands on Spark AuthZ or falls back to the legacy SGP backend for the caller's account.

Linear ticket: https://linear.app/scale-epd/issue/AGX1-272/fgac-agent-api-key-dual-write-registerderegister-on-createdelete

Cross-repo stack

#	Repo	PR	Purpose	State
1	scaleapi/scaleapi	#144657	sgp-authz 0.6.0 → 0.7.0; `parent_resource` kwarg	✅ Merged
2	scaleapi/agentex	#354	agentex-auth Port + Spark adapter + HTTP routes (`/v1/authz/register`, `/v1/authz/deregister`) + `api_key` mapping	✅ Merged
2b	scaleapi/agentex	#353	agentex-auth per-account routing via `FGAC_AGENTEX_AUTH_SPARK` flag	Open (this PR is decoupled from it — works whether #353 lands first or not)
3 (this PR)	scaleapi/scale-agentex	this	scale-agentex use case calling `register_resource(parent=agent)`	Ready
4	scaleapi/scale-agentex	#252	route-layer FGAC + 404 collapse + two-factor mutations	Stacked on this

Changes

Port + adapter

src/adapters/authorization/port.py — abstract register_resource(resource, parent=None) and deregister_resource(resource) on AuthorizationGateway.
src/adapters/authorization/adapter_agentex_authz_proxy.py — POST /v1/authz/register and /v1/authz/deregister to agentex-auth (endpoints exposed by #354).

Service layer

src/domain/services/authorization_service.py — register_resource / deregister_resource service methods mirror the grant/revoke pattern (principal_context override, _bypass support, parent identity in log line).

Use case

src/domain/use_cases/agent_api_keys_use_case.py:
- create calls _register_api_key_in_auth unconditionally; parent=AgentexResource.agent(agent_id) is load-bearing for the SpiceDB cascade.
- delete (and delete_by_* variants) call _deregister_api_key_from_auth after the Postgres row is gone.
- Register fails closed (re-raises); deregister is best-effort (logs).
- The "no creator resolvable" guard (internal-key path with no principal user/service identity) reads from principal_context at call time — no persisted creator columns.

Tests

tests/integration/services/test_agent_api_key_service_dual_write.py — 6 cases. Mocks the authorization service; asserts the call sequencing inside the use case.
tests/integration/fixtures/integration_client.py + tests/unit/use_cases/test_agents_api_keys_use_case.py updated for the new constructor signature.

Test plan

6 / 6 dual-write integration tests pass locally.
Parent edge contract pinned: test_create_api_key_calls_register_resource_with_parent asserts parent=AgentexResource.agent(agent.id) is passed on every register call.
Fail-closed on register failure preserved.
Best-effort deregister preserved.
No-creator-resolvable skip path preserved (runtime check, no persistence).
4 pre-existing docker-fixture errors in test_agents_api_keys_use_case.py predate this PR (verified via git stash).
CI: ruff, ruff-format, alembic migration lint.
End-to-end against a real SpiceDB once exercised in dev via the agentex-auth routing flip.

Diff scope

10 files changed, 55 insertions(+), 368 deletions(-) — net deletion. Earlier iterations had ~370 lines of OSS-coupling that Harvey's review pruned.

Out of scope / follow-ups

ZedToken plumbing. If a SpiceDB consistency token use case emerges, expose it under a vendor-neutral column name (not spark_authz_zedtoken).
Audit / backfill for legacy api_keys created before this PR. Tracked separately.
Other resource types' grant → register_resource swap (task, build, deployment, schedule). Each owns its own follow-up.

Linked: AGX1-272.

Greptile Summary

This PR wires register_resource / deregister_resource into AgentAPIKeysUseCase.create and all three delete variants, so each agent_api_key gets a SpiceDB tuple with a parent_agent edge on create and has that tuple removed on delete. The dual-write is unconditional from scale-agentex; per-account routing (Spark vs legacy SGP) is handled by agentex-auth's FGAC_AGENTEX_AUTH_SPARK flag in the companion PR.

Port + adapter: register_resource / deregister_resource added to AuthorizationGateway and implemented in AgentexAuthorizationProxy via POST to /v1/authz/register and /v1/authz/deregister, following the same payload pattern as the existing grant/revoke methods.
Use-case semantics: Create is fail-closed (register before DB write; re-raises on failure) and skips when no creator identity is resolvable on principal_context. Delete (all three variants) is best-effort (deregister after DB delete; logs but does not re-raise on failure).
Tests: Six new dual-write integration tests cover parent-edge contract, fail-closed register, best-effort deregister, no-creator skip, and delete_by_name; unit fixture updated to supply AsyncMock for both new methods.

Confidence Score: 5/5

Safe to merge; the dual-write is unconditional but correctly layered so that agentex-auth per-account routing controls whether Spark or the legacy backend is actually hit.

The create path is fail-closed (register before DB write, re-raises on failure), the delete paths are best-effort (swallow errors after the Postgres row is gone), and the no-creator-identity guard prevents writing ownerless SpiceDB tuples. Six dedicated integration tests exercise the key contracts and no logic errors were found.

No files require special attention.

Important Files Changed

Filename	Overview
agentex/src/domain/use_cases/agent_api_keys_use_case.py	Wires dual-write into create (fail-closed, before DB write, with parent-agent edge) and all three delete variants (best-effort, after DB delete); creator-identity guard correctly short-circuits when no user/service principal is resolvable.
agentex/src/domain/services/authorization_service.py	Adds register_resource and deregister_resource service methods with _bypass() guard, principal_context override pattern, and structured logging — mirrors the existing grant/revoke service methods faithfully.
agentex/src/adapters/authorization/adapter_agentex_authz_proxy.py	Implements register_resource and deregister_resource via POST to /v1/authz/register and /v1/authz/deregister; consistent with existing grant/revoke payload patterns including model_dump() for resource/parent.
agentex/src/api/schemas/authorization_types.py	Adds api_key to AgentexResourceType enum and corresponding factory methods; follows the exact same pattern as the existing task and agent types.
agentex/tests/integration/services/test_agent_api_key_service_dual_write.py	Six integration tests covering the core dual-write contracts: parent-edge on create, deregister on delete, fail-closed register, best-effort deregister, no-creator skip, and delete-by-name path.

Sequence Diagram

sequenceDiagram
    participant Route as API Route
    participant UC as AgentAPIKeysUseCase
    participant Auth as AuthorizationService
    participant GW as AgentexAuthorizationProxy
    participant DB as AgentAPIKeyRepository
    participant AX as agentex-auth

    Note over Route,AX: create path (fail-closed)
    Route->>UC: create(name, agent_id, api_key)
    UC->>Auth: check principal_context
    alt no creator identity
        UC-->>DB: skip auth, write row anyway
    else creator resolvable
        Auth->>GW: "register_resource(principal, api_key, parent=agent)"
        GW->>AX: POST /v1/authz/register
        alt register fails
            AX-->>UC: raises - DB row NOT written
        else register succeeds
            UC->>DB: create(AgentAPIKeyEntity)
        end
    end

    Note over Route,AX: delete path (best-effort)
    Route->>UC: delete(id)
    UC->>DB: get(id)
    alt ItemDoesNotExist
        UC-->>Route: no-op
    else row exists
        UC->>DB: delete(id)
        UC->>Auth: deregister_resource(api_key)
        Auth->>GW: deregister_resource(principal, api_key)
        GW->>AX: POST /v1/authz/deregister
        alt deregister fails
            UC->>UC: log warning, swallow
        end
    end

Comments Outside Diff (1)

agentex/tests/integration/services/test_agent_api_key_service_dual_write.py, line 703-727 (link)

Missing test for grant-succeeds-but-DB-create-fails (orphan Spark tuple)

The suite covers grant failure preventing the DB write, but not the reverse: grant succeeding followed by repo.create raising (e.g., a duplicate key or transient DB error). In that case a Spark tuple exists for an api_key_id that never lands in Postgres. Adding a test for this case would make the documented known-limitation explicit and guard against accidental regression if compensating logic is added later.

Prompt To Fix With AI

This is a comment left during a code review.
Path: agentex/tests/integration/services/test_agent_api_key_service_dual_write.py
Line: 703-727

Comment:
**Missing test for grant-succeeds-but-DB-create-fails (orphan Spark tuple)**

The suite covers `grant` failure preventing the DB write, but not the reverse: `grant` succeeding followed by `repo.create` raising (e.g., a duplicate key or transient DB error). In that case a Spark tuple exists for an `api_key_id` that never lands in Postgres. Adding a test for this case would make the documented known-limitation explicit and guard against accidental regression if compensating logic is added later.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (8): Last reviewed commit: "refactor(AGX1-272): early-return on dele..." | Re-trigger Greptile}

Closes the parent_agent cascade gap surfaced on scaleapi/agentex#354. The api_key dual-write (AGX1-272, PR #248) currently calls grant() which writes the owner edge in SpiceDB but NOT the parent_agent edge. The agent_api_key schema requires `read = ... & parent_agent->read & ...`, so every downstream read/update fails closed without that edge. This PR adds register_resource/deregister_resource (Port + adapter + service) and swaps the api_keys use case from grant→register_resource with parent=AgentexResource.agent(agent_id). Now the owner edge and parent_agent edge are written atomically. Stack: - scaleapi/scaleapi#144657 — sgp-authz 0.7.0 (parent_resource kwarg). - scaleapi/agentex#355 — agentex-auth Port + adapter + HTTP routes. - #248 — original AGX1-272 dual-write (this stacks on it). - THIS PR — extends #248 to use the parent-aware path. Changes: - Port: abstract register_resource(resource, parent=None) and deregister_resource(resource). - Adapter proxy: POST /v1/authz/register and /v1/authz/deregister. - Service: mirror existing grant/revoke pattern (principal_context override, _bypass support, parent in log line for cascade debugging). - Use case: swap grant→register_resource passing parent=agent; swap revoke→deregister_resource. except Exception wrappers preserved (fail-closed on register, best-effort on deregister). - Tests: rename mocks to register_resource/deregister_resource; assert the parent edge is passed correctly. Test plan: - pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py → 8 / 8 pass. - New test ``test_create_api_key_calls_grant_when_flag_on`` asserts parent.type == AgentexResourceType.agent and parent.selector == agent.id. Other resource types' grant→register_resource swap is out of scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…AGENT_API_KEYS_DUAL_WRITE flag Mirrors the AGX1-274 task dual-write pattern (PR #246) for agent_api_keys. - Adds creator_user_id / creator_service_account_id / spark_authz_zedtoken columns to agent_api_keys, with CHECK constraint and concurrent indexes. - On create, when FGAC_AGENT_API_KEYS_DUAL_WRITE is enabled for the caller's account, calls authorization_service.grant(AgentexResource.api_key(id)) BEFORE the Postgres write. Grant failure aborts the create. - On delete, best-effort revoke after the Postgres delete. Failures are logged but do not block the delete. - Adds AgentexResourceType.api_key and AgentexResource.api_key(...) factory. - Creates src/utils/feature_flags.py with both FGAC_TASKS_DUAL_WRITE and FGAC_AGENT_API_KEYS_DUAL_WRITE (file does not exist on main yet; if PR #246 lands first this becomes a rebase concern). Structural divergence from tasks: agent_api_keys have no service layer, so the dual-write logic lives in AgentAPIKeysUseCase rather than a separate service. This keeps the call site simple and avoids inventing a new layer. Route layer (read-side auth checks) is out of scope; that's PR B (AGX1-273). agentex-auth spark_mapping.py update is a sibling-repo concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closes the parent_agent cascade gap surfaced on scaleapi/agentex#354. The api_key dual-write (AGX1-272, PR #248) currently calls grant() which writes the owner edge in SpiceDB but NOT the parent_agent edge. The agent_api_key schema requires `read = ... & parent_agent->read & ...`, so every downstream read/update fails closed without that edge. This PR adds register_resource/deregister_resource (Port + adapter + service) and swaps the api_keys use case from grant→register_resource with parent=AgentexResource.agent(agent_id). Now the owner edge and parent_agent edge are written atomically. Stack: - scaleapi/scaleapi#144657 — sgp-authz 0.7.0 (parent_resource kwarg). - scaleapi/agentex#355 — agentex-auth Port + adapter + HTTP routes. - #248 — original AGX1-272 dual-write (this stacks on it). - THIS PR — extends #248 to use the parent-aware path. Changes: - Port: abstract register_resource(resource, parent=None) and deregister_resource(resource). - Adapter proxy: POST /v1/authz/register and /v1/authz/deregister. - Service: mirror existing grant/revoke pattern (principal_context override, _bypass support, parent in log line for cascade debugging). - Use case: swap grant→register_resource passing parent=agent; swap revoke→deregister_resource. except Exception wrappers preserved (fail-closed on register, best-effort on deregister). - Tests: rename mocks to register_resource/deregister_resource; assert the parent edge is passed correctly. Test plan: - pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py → 8 / 8 pass. - New test ``test_create_api_key_calls_grant_when_flag_on`` asserts parent.type == AgentexResourceType.agent and parent.selector == agent.id. Other resource types' grant→register_resource swap is out of scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…L_WRITE Per team discussion: rather than maintain a parallel env-var flag system in scale-agentex, route api_key dual-write flag checks through egp-api-backend's existing flag service. One source of truth across services, single flip surface for ops, fewer per-env env-var allowlists to keep in sync. Changes: - EnvVarKeys.EGP_API_BACKEND_URL — new env var for the egp-api-backend base URL. Used by the new HTTP-backed flag provider. - FeatureFlagProvider rewritten as an HTTP client of egp-api-backend's GET /feature-flag/{id} endpoint: * Forwards x-api-key / x-user-id / x-service-account-id / x-selected-account-id from the caller's principal_context so the endpoint's REQUIRE_IDENTITY_AND_OPTIONAL_ACCOUNT policy admits the request. * Coerces the response's `value` field to bool. * Fails closed to False on any error (config missing, no identity, non-2xx, transport failure, JSON parse failure) — the legacy no-Spark code path is the safe default. * `is_enabled` is now async (HTTP call). Signature is `is_enabled(name, *, principal_context, account_id)`. - AgentAPIKeysUseCase: both call sites now await is_enabled and pass principal_context. _deregister grabs principal_context from self.authorization_service. - Test fixtures: mock FeatureFlagProvider directly (Mock with is_enabled = AsyncMock(return_value=flag_on)) so dual-write tests stay hermetic. The pre-existing FeatureFlagProvider() no-arg constructions in test_agents_api_keys_use_case.py and integration_client.py now pass egp_api_backend_url=None (provider returns False without it, matching the prior "flag never enabled in unit tests" behavior). Out of scope: - Migrating Asher's FGAC_TASKS_DUAL_WRITE flag check off env vars. That's task-team-owned and we leave their existing pattern alone per the team discussion (new-work-only). - Caching the flag response. Each is_enabled is a fresh HTTP call. Egp-api-backend's flag endpoint is fast and the caller paths are already crossing the network for the actual register/deregister, so one extra round-trip is acceptable for now. Add caching later if load profiling shows it matters. Test plan: - uv run pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py — 8/8 pass. - Existing 4 unrelated test_agents_api_keys_use_case.py docker-fixture errors predate this commit (verified via `git stash`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three rounds of Harvey's review feedback together pointed at the same concern: scale-agentex (OSS) should not be coupled to egp-api-backend (proprietary) feature-flag service or Spark-specific schema concepts. Changes: - Drop FeatureFlagProvider and the egp-api-backend HTTP query entirely. scale-agentex now calls register_resource/deregister_resource unconditionally. agentex-auth's per-account routing (FGAC_AGENTEX_AUTH_SPARK, scaleapi/agentex#353) decides whether the call lands on Spark AuthZ or falls back to legacy SGP for the caller's account. - Drop the creator_user_id / creator_service_account_id columns from the agent_api_keys table. They were only used as a runtime guard inside the auth-registration helper. Moved that check inline. - Drop the spark_authz_zedtoken column. The name leaked SpiceDB-specific terminology into the OSS schema, and the column was always None. - Delete the Alembic migration entirely. migration_history.txt re-pointed to head=6c942325c828. - Rename helper methods: _register_api_key_in_spark_authz -> _register_api_key_in_auth, same for deregister. Test plan: 6/6 dual-write integration tests pass. Pre-existing docker-fixture errors in test_agents_api_keys_use_case.py unrelated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@jenniechung

Per @jenniechung review: delete(id) used to call deregister unconditionally, which wastes an auth round-trip when the id doesn't exist. The two delete_by_* methods already use pre-fetch + `if existing is not None`; delete() now matches that pattern via try/except ItemDoesNotExist. Also tightened the unconditional-register comment from 5 lines to 2. Tests: - Fixture now defaults repo.get() to a stub entity so existing delete tests flow through the deregister path; new helper `_stub_api_key`. - New test_delete_api_key_skips_deregister_when_row_does_not_exist asserts deregister.assert_not_called() when repo.get raises ItemDoesNotExist. - 7/7 dual-write tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…t fixture Per reviewer: the agent_api_keys_use_case fixture had `grant` and `revoke` set up as AsyncMocks but not `register_resource` / `deregister_resource`. The deregister path silently swallowed the resulting TypeError via its `except Exception` handler, so delete tests passed without exercising the auth-side call. Matches the integration_client.py fixture shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

harvhan · 2026-05-29T23:31:09Z

            api_key_type=api_key_type,
            api_key=api_key,
        )
        return await self.agent_api_key_repo.create(item=agent_api_key)


Should we follow the egp-api-backend's pattern and add compensating deregister if repo.create fails?

followed asher's approach here with tasks with not compensating deregister if repo.create fails for consistency- i can update tho

agentex-auth derives account_id from the authenticated principal on its side, so threading it through routes and use-case calls in OSS scale-agentex is dead weight — and leaks the SGP-specific field name. Drop: - account_id extraction (getattr on principal_context) in all three api_keys route handlers; drop the now-unused DAuthorizationService injection from create_api_key/delete_agent_api_key. - account_id parameter on create, delete, delete_by_agent_id_and_key_name, delete_by_agent_name_and_key_name, _register_api_key_in_auth, and _deregister_api_key_from_auth. - account_id entries from log extras (it was only ever log context — no auth call ever consumed it). - account_id kwargs at use-case call sites in the dual-write test. Keep the _principal(account_id=...) test helper — it models the principal shape and Harvey's feedback is about production OSS surface, not test mocks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Per Harvey's nit: when the pre-fetch raises ItemDoesNotExist, skip both the repo.delete (was issuing a wasted DB round-trip for a no-op) and the deregister. Same HTTP response from the route either way. Test renamed and updated: now asserts neither repo.delete nor deregister is called when the row doesn't exist (previously asserted repo.delete was called). Note: the delete_by_* variants have the symmetric pattern (unconditional repo.delete_by_* even when the pre-fetch returns None), but they predate this PR and Harvey's comment was scoped to delete. Leaving those for now. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

dm36 marked this pull request as ready for review May 26, 2026 21:31

dm36 requested a review from a team as a code owner May 26, 2026 21:31

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread agentex/src/domain/use_cases/agent_api_keys_use_case.py Outdated

Comment thread agentex/src/domain/use_cases/agent_api_keys_use_case.py Outdated

dm36 mentioned this pull request May 26, 2026

feat: register_resource with parent edge for agent_api_keys dual-write #253

Closed

4 tasks

dm36 marked this pull request as draft May 27, 2026 17:27

dm36 and others added 2 commits May 28, 2026 10:28

dm36 marked this pull request as ready for review May 28, 2026 17:29

dm36 force-pushed the dhruv/agx1-272-agent-api-keys-dual-write branch from 9668f1a to e72df68 Compare May 28, 2026 17:29

dm36 mentioned this pull request May 28, 2026

feat(AGX1-263): agent_api_keys route migration to FGAC (404 collapse + two-factor) #252

Draft

5 tasks

harvhan reviewed May 29, 2026

View reviewed changes

Comment thread agentex/src/utils/feature_flags.py Outdated

Comment thread agentex/src/domain/entities/agent_api_keys.py Outdated

Comment thread agentex/src/domain/entities/agent_api_keys.py Outdated

dm36 requested review from danielmillerp, declan-scale, harvhan and smoreinis May 29, 2026 19:49

jenniechung reviewed May 29, 2026

View reviewed changes

Comment thread agentex/src/domain/use_cases/agent_api_keys_use_case.py

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

Comment thread agentex/tests/unit/use_cases/test_agents_api_keys_use_case.py

harvhan approved these changes May 29, 2026

View reviewed changes

harvhan mentioned this pull request Jun 1, 2026

feat(schedules): register agent_schedule in authz on create/delete #259

Open

4 tasks

dm36 and others added 2 commits June 1, 2026 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248

feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248
dm36 wants to merge 8 commits into
mainfrom
dhruv/agx1-272-agent-api-keys-dual-write

dm36 commented May 26, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harvhan May 29, 2026

Uh oh!

dm36 Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dm36 commented May 26, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Cross-repo stack

Changes

Port + adapter

Service layer

Use case

Tests

Test plan

Diff scope

Out of scope / follow-ups

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harvhan May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dm36 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dm36 commented May 26, 2026 •

edited by greptile-apps Bot

Loading