feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248
Open
dm36 wants to merge 8 commits into
Open
feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248dm36 wants to merge 8 commits into
dm36 wants to merge 8 commits into
Conversation
4 tasks
dm36
added a commit
that referenced
this pull request
May 26, 2026
Closes the parent_agent cascade gap surfaced on scaleapi/agentex#354. The api_key dual-write (AGX1-272, PR #248) currently calls grant() which writes the owner edge in SpiceDB but NOT the parent_agent edge. The agent_api_key schema requires `read = ... & parent_agent->read & ...`, so every downstream read/update fails closed without that edge. This PR adds register_resource/deregister_resource (Port + adapter + service) and swaps the api_keys use case from grant→register_resource with parent=AgentexResource.agent(agent_id). Now the owner edge and parent_agent edge are written atomically. Stack: - scaleapi/scaleapi#144657 — sgp-authz 0.7.0 (parent_resource kwarg). - scaleapi/agentex#355 — agentex-auth Port + adapter + HTTP routes. - #248 — original AGX1-272 dual-write (this stacks on it). - THIS PR — extends #248 to use the parent-aware path. Changes: - Port: abstract register_resource(resource, parent=None) and deregister_resource(resource). - Adapter proxy: POST /v1/authz/register and /v1/authz/deregister. - Service: mirror existing grant/revoke pattern (principal_context override, _bypass support, parent in log line for cascade debugging). - Use case: swap grant→register_resource passing parent=agent; swap revoke→deregister_resource. except Exception wrappers preserved (fail-closed on register, best-effort on deregister). - Tests: rename mocks to register_resource/deregister_resource; assert the parent edge is passed correctly. Test plan: - pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py → 8 / 8 pass. - New test ``test_create_api_key_calls_grant_when_flag_on`` asserts parent.type == AgentexResourceType.agent and parent.selector == agent.id. Other resource types' grant→register_resource swap is out of scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…AGENT_API_KEYS_DUAL_WRITE flag Mirrors the AGX1-274 task dual-write pattern (PR #246) for agent_api_keys. - Adds creator_user_id / creator_service_account_id / spark_authz_zedtoken columns to agent_api_keys, with CHECK constraint and concurrent indexes. - On create, when FGAC_AGENT_API_KEYS_DUAL_WRITE is enabled for the caller's account, calls authorization_service.grant(AgentexResource.api_key(id)) BEFORE the Postgres write. Grant failure aborts the create. - On delete, best-effort revoke after the Postgres delete. Failures are logged but do not block the delete. - Adds AgentexResourceType.api_key and AgentexResource.api_key(...) factory. - Creates src/utils/feature_flags.py with both FGAC_TASKS_DUAL_WRITE and FGAC_AGENT_API_KEYS_DUAL_WRITE (file does not exist on main yet; if PR #246 lands first this becomes a rebase concern). Structural divergence from tasks: agent_api_keys have no service layer, so the dual-write logic lives in AgentAPIKeysUseCase rather than a separate service. This keeps the call site simple and avoids inventing a new layer. Route layer (read-side auth checks) is out of scope; that's PR B (AGX1-273). agentex-auth spark_mapping.py update is a sibling-repo concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes the parent_agent cascade gap surfaced on scaleapi/agentex#354. The api_key dual-write (AGX1-272, PR #248) currently calls grant() which writes the owner edge in SpiceDB but NOT the parent_agent edge. The agent_api_key schema requires `read = ... & parent_agent->read & ...`, so every downstream read/update fails closed without that edge. This PR adds register_resource/deregister_resource (Port + adapter + service) and swaps the api_keys use case from grant→register_resource with parent=AgentexResource.agent(agent_id). Now the owner edge and parent_agent edge are written atomically. Stack: - scaleapi/scaleapi#144657 — sgp-authz 0.7.0 (parent_resource kwarg). - scaleapi/agentex#355 — agentex-auth Port + adapter + HTTP routes. - #248 — original AGX1-272 dual-write (this stacks on it). - THIS PR — extends #248 to use the parent-aware path. Changes: - Port: abstract register_resource(resource, parent=None) and deregister_resource(resource). - Adapter proxy: POST /v1/authz/register and /v1/authz/deregister. - Service: mirror existing grant/revoke pattern (principal_context override, _bypass support, parent in log line for cascade debugging). - Use case: swap grant→register_resource passing parent=agent; swap revoke→deregister_resource. except Exception wrappers preserved (fail-closed on register, best-effort on deregister). - Tests: rename mocks to register_resource/deregister_resource; assert the parent edge is passed correctly. Test plan: - pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py → 8 / 8 pass. - New test ``test_create_api_key_calls_grant_when_flag_on`` asserts parent.type == AgentexResourceType.agent and parent.selector == agent.id. Other resource types' grant→register_resource swap is out of scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9668f1a to
e72df68
Compare
…L_WRITE
Per team discussion: rather than maintain a parallel env-var flag system
in scale-agentex, route api_key dual-write flag checks through
egp-api-backend's existing flag service. One source of truth across services,
single flip surface for ops, fewer per-env env-var allowlists to keep in sync.
Changes:
- EnvVarKeys.EGP_API_BACKEND_URL — new env var for the egp-api-backend
base URL. Used by the new HTTP-backed flag provider.
- FeatureFlagProvider rewritten as an HTTP client of egp-api-backend's
GET /feature-flag/{id} endpoint:
* Forwards x-api-key / x-user-id / x-service-account-id /
x-selected-account-id from the caller's principal_context so the
endpoint's REQUIRE_IDENTITY_AND_OPTIONAL_ACCOUNT policy admits the
request.
* Coerces the response's `value` field to bool.
* Fails closed to False on any error (config missing, no identity,
non-2xx, transport failure, JSON parse failure) — the legacy
no-Spark code path is the safe default.
* `is_enabled` is now async (HTTP call). Signature is
`is_enabled(name, *, principal_context, account_id)`.
- AgentAPIKeysUseCase: both call sites now await is_enabled and pass
principal_context. _deregister grabs principal_context from
self.authorization_service.
- Test fixtures: mock FeatureFlagProvider directly (Mock with
is_enabled = AsyncMock(return_value=flag_on)) so dual-write tests stay
hermetic. The pre-existing FeatureFlagProvider() no-arg constructions
in test_agents_api_keys_use_case.py and integration_client.py now pass
egp_api_backend_url=None (provider returns False without it, matching
the prior "flag never enabled in unit tests" behavior).
Out of scope:
- Migrating Asher's FGAC_TASKS_DUAL_WRITE flag check off env vars.
That's task-team-owned and we leave their existing pattern alone per
the team discussion (new-work-only).
- Caching the flag response. Each is_enabled is a fresh HTTP call.
Egp-api-backend's flag endpoint is fast and the caller paths are
already crossing the network for the actual register/deregister, so
one extra round-trip is acceptable for now. Add caching later if
load profiling shows it matters.
Test plan:
- uv run pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py — 8/8 pass.
- Existing 4 unrelated test_agents_api_keys_use_case.py docker-fixture
errors predate this commit (verified via `git stash`).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
harvhan
reviewed
May 29, 2026
Three rounds of Harvey's review feedback together pointed at the same concern: scale-agentex (OSS) should not be coupled to egp-api-backend (proprietary) feature-flag service or Spark-specific schema concepts. Changes: - Drop FeatureFlagProvider and the egp-api-backend HTTP query entirely. scale-agentex now calls register_resource/deregister_resource unconditionally. agentex-auth's per-account routing (FGAC_AGENTEX_AUTH_SPARK, scaleapi/agentex#353) decides whether the call lands on Spark AuthZ or falls back to legacy SGP for the caller's account. - Drop the creator_user_id / creator_service_account_id columns from the agent_api_keys table. They were only used as a runtime guard inside the auth-registration helper. Moved that check inline. - Drop the spark_authz_zedtoken column. The name leaked SpiceDB-specific terminology into the OSS schema, and the column was always None. - Delete the Alembic migration entirely. migration_history.txt re-pointed to head=6c942325c828. - Rename helper methods: _register_api_key_in_spark_authz -> _register_api_key_in_auth, same for deregister. Test plan: 6/6 dual-write integration tests pass. Pre-existing docker-fixture errors in test_agents_api_keys_use_case.py unrelated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jenniechung
reviewed
May 29, 2026
Per @jenniechung review: delete(id) used to call deregister unconditionally, which wastes an auth round-trip when the id doesn't exist. The two delete_by_* methods already use pre-fetch + `if existing is not None`; delete() now matches that pattern via try/except ItemDoesNotExist. Also tightened the unconditional-register comment from 5 lines to 2. Tests: - Fixture now defaults repo.get() to a stub entity so existing delete tests flow through the deregister path; new helper `_stub_api_key`. - New test_delete_api_key_skips_deregister_when_row_does_not_exist asserts deregister.assert_not_called() when repo.get raises ItemDoesNotExist. - 7/7 dual-write tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t fixture Per reviewer: the agent_api_keys_use_case fixture had `grant` and `revoke` set up as AsyncMocks but not `register_resource` / `deregister_resource`. The deregister path silently swallowed the resulting TypeError via its `except Exception` handler, so delete tests passed without exercising the auth-side call. Matches the integration_client.py fixture shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
harvhan
approved these changes
May 29, 2026
| api_key_type=api_key_type, | ||
| api_key=api_key, | ||
| ) | ||
| return await self.agent_api_key_repo.create(item=agent_api_key) |
There was a problem hiding this comment.
Should we follow the egp-api-backend's pattern and add compensating deregister if repo.create fails?
Author
There was a problem hiding this comment.
followed asher's approach here with tasks with not compensating deregister if repo.create fails for consistency- i can update tho
4 tasks
agentex-auth derives account_id from the authenticated principal on its side, so threading it through routes and use-case calls in OSS scale-agentex is dead weight — and leaks the SGP-specific field name. Drop: - account_id extraction (getattr on principal_context) in all three api_keys route handlers; drop the now-unused DAuthorizationService injection from create_api_key/delete_agent_api_key. - account_id parameter on create, delete, delete_by_agent_id_and_key_name, delete_by_agent_name_and_key_name, _register_api_key_in_auth, and _deregister_api_key_from_auth. - account_id entries from log extras (it was only ever log context — no auth call ever consumed it). - account_id kwargs at use-case call sites in the dual-write test. Keep the _principal(account_id=...) test helper — it models the principal shape and Harvey's feedback is about production OSS surface, not test mocks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per Harvey's nit: when the pre-fetch raises ItemDoesNotExist, skip both the repo.delete (was issuing a wasted DB round-trip for a no-op) and the deregister. Same HTTP response from the route either way. Test renamed and updated: now asserts neither repo.delete nor deregister is called when the row doesn't exist (previously asserted repo.delete was called). Note: the delete_by_* variants have the symmetric pattern (unconditional repo.delete_by_* even when the pre-fetch returns None), but they predate this PR and Harvey's comment was scoped to delete. Leaving those for now. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires
register_resource/deregister_resourceintoAgentAPIKeysUseCase.create/deleteso api_keys get a SpiceDB tuple (withparent_agentedge) on create and have that tuple removed on delete. Companion to the route-migration PR (#252).scale-agentex is intentionally decoupled from egp-api-backend. The dual-write call is unconditional from this repo. agentex-auth's per-account routing (scaleapi/agentex#353,
FGAC_AGENTEX_AUTH_SPARK) decides whether the call lands on Spark AuthZ or falls back to the legacy SGP backend for the caller's account.Linear ticket: https://linear.app/scale-epd/issue/AGX1-272/fgac-agent-api-key-dual-write-registerderegister-on-createdelete
Cross-repo stack
parent_resourcekwarg/v1/authz/register,/v1/authz/deregister) +api_keymappingFGAC_AGENTEX_AUTH_SPARKflagregister_resource(parent=agent)Changes
Port + adapter
src/adapters/authorization/port.py— abstractregister_resource(resource, parent=None)andderegister_resource(resource)onAuthorizationGateway.src/adapters/authorization/adapter_agentex_authz_proxy.py— POST/v1/authz/registerand/v1/authz/deregisterto agentex-auth (endpoints exposed by #354).Service layer
src/domain/services/authorization_service.py—register_resource/deregister_resourceservice methods mirror the grant/revoke pattern (principal_contextoverride,_bypasssupport, parent identity in log line).Use case
src/domain/use_cases/agent_api_keys_use_case.py:createcalls_register_api_key_in_authunconditionally;parent=AgentexResource.agent(agent_id)is load-bearing for the SpiceDB cascade.delete(anddelete_by_*variants) call_deregister_api_key_from_authafter the Postgres row is gone.principal_contextat call time — no persisted creator columns.Tests
tests/integration/services/test_agent_api_key_service_dual_write.py— 6 cases. Mocks the authorization service; asserts the call sequencing inside the use case.tests/integration/fixtures/integration_client.py+tests/unit/use_cases/test_agents_api_keys_use_case.pyupdated for the new constructor signature.Test plan
test_create_api_key_calls_register_resource_with_parentassertsparent=AgentexResource.agent(agent.id)is passed on every register call.test_agents_api_keys_use_case.pypredate this PR (verified viagit stash).Diff scope
10 files changed, 55 insertions(+), 368 deletions(-)— net deletion. Earlier iterations had ~370 lines of OSS-coupling that Harvey's review pruned.Out of scope / follow-ups
spark_authz_zedtoken).task,build,deployment,schedule). Each owns its own follow-up.Linked: AGX1-272.
Greptile Summary
This PR wires
register_resource/deregister_resourceintoAgentAPIKeysUseCase.createand all threedeletevariants, so eachagent_api_keygets a SpiceDB tuple with aparent_agentedge on create and has that tuple removed on delete. The dual-write is unconditional from scale-agentex; per-account routing (Spark vs legacy SGP) is handled by agentex-auth'sFGAC_AGENTEX_AUTH_SPARKflag in the companion PR.register_resource/deregister_resourceadded toAuthorizationGatewayand implemented inAgentexAuthorizationProxyvia POST to/v1/authz/registerand/v1/authz/deregister, following the same payload pattern as the existinggrant/revokemethods.principal_context. Delete (all three variants) is best-effort (deregister after DB delete; logs but does not re-raise on failure).delete_by_name; unit fixture updated to supplyAsyncMockfor both new methods.Confidence Score: 5/5
Safe to merge; the dual-write is unconditional but correctly layered so that agentex-auth per-account routing controls whether Spark or the legacy backend is actually hit.
The create path is fail-closed (register before DB write, re-raises on failure), the delete paths are best-effort (swallow errors after the Postgres row is gone), and the no-creator-identity guard prevents writing ownerless SpiceDB tuples. Six dedicated integration tests exercise the key contracts and no logic errors were found.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Route as API Route participant UC as AgentAPIKeysUseCase participant Auth as AuthorizationService participant GW as AgentexAuthorizationProxy participant DB as AgentAPIKeyRepository participant AX as agentex-auth Note over Route,AX: create path (fail-closed) Route->>UC: create(name, agent_id, api_key) UC->>Auth: check principal_context alt no creator identity UC-->>DB: skip auth, write row anyway else creator resolvable Auth->>GW: "register_resource(principal, api_key, parent=agent)" GW->>AX: POST /v1/authz/register alt register fails AX-->>UC: raises - DB row NOT written else register succeeds UC->>DB: create(AgentAPIKeyEntity) end end Note over Route,AX: delete path (best-effort) Route->>UC: delete(id) UC->>DB: get(id) alt ItemDoesNotExist UC-->>Route: no-op else row exists UC->>DB: delete(id) UC->>Auth: deregister_resource(api_key) Auth->>GW: deregister_resource(principal, api_key) GW->>AX: POST /v1/authz/deregister alt deregister fails UC->>UC: log warning, swallow end endComments Outside Diff (1)
agentex/tests/integration/services/test_agent_api_key_service_dual_write.py, line 703-727 (link)The suite covers
grantfailure preventing the DB write, but not the reverse:grantsucceeding followed byrepo.createraising (e.g., a duplicate key or transient DB error). In that case a Spark tuple exists for anapi_key_idthat never lands in Postgres. Adding a test for this case would make the documented known-limitation explicit and guard against accidental regression if compensating logic is added later.Prompt To Fix With AI
Reviews (8): Last reviewed commit: "refactor(AGX1-272): early-return on dele..." | Re-trigger Greptile