feat(AGX1-275): per-RPC task permission rewire and 404/403 wrap#249
Open
asherfink wants to merge 1 commit into
Open
feat(AGX1-275): per-RPC task permission rewire and 404/403 wrap#249asherfink wants to merge 1 commit into
asherfink wants to merge 1 commit into
Conversation
5 tasks
5 tasks
dm36
added a commit
that referenced
this pull request
May 27, 2026
…se and two-factor mutations Mirrors AGX1-275 (PR #249) for agent_api_keys. Wires Spark AuthZ checks into every api_key route, collapses denials to 404 (so name/id probes can't distinguish "present in another tenant" from "absent"), and relies on SpiceDB's transitive expansion of api_key.{update,delete} (= editor & parent_agent->update & tenant_gate) for two-factor mutations rather than issuing two explicit checks at the route layer. - src/utils/agent_api_key_authorization.py (new): _check_api_key_or_collapse_to_404 — catches AuthorizationError, raises ItemDoesNotExist. Same shape as Asher's task helper. - src/utils/authorization_shortcuts.py: DAuthorizedId routes AgentexResourceType.api_key through the wrap. (DAuthorizedName isn't used for api_keys; the name lookup is (agent_id, name, api_key_type), not a single globally-unique path param — the route handlers call the collapse helper inline instead.) - src/api/routes/agent_api_keys.py: * POST: explicit agent.update on parent (no api_key resource yet). * GET list: DAuthorizedResourceIds + filter; None passes through. * GET /name/{name}: inline collapse helper. * GET /{id}: DAuthorizedId(api_key, read). * DELETE /{id}: DAuthorizedId(api_key, delete). Two-factor via SpiceDB schema (api_key.delete expands to parent_agent.update); no second route-layer check. * DELETE /name/{api_key_name}: inline collapse helper. - tests/unit/api/test_agent_api_keys_authz.py (new): 12 tests, all pass. Stacked on dhruv/agx1-272-agent-api-keys-dual-write (PR A). Does NOT touch dual-write logic. Does NOT modify agentex-auth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
b936a08 to
7841696
Compare
dm36
added a commit
that referenced
this pull request
May 28, 2026
…se and two-factor mutations Mirrors AGX1-275 (PR #249) for agent_api_keys. Wires Spark AuthZ checks into every api_key route, collapses denials to 404 (so name/id probes can't distinguish "present in another tenant" from "absent"), and relies on SpiceDB's transitive expansion of api_key.{update,delete} (= editor & parent_agent->update & tenant_gate) for two-factor mutations rather than issuing two explicit checks at the route layer. - src/utils/agent_api_key_authorization.py (new): _check_api_key_or_collapse_to_404 — catches AuthorizationError, raises ItemDoesNotExist. Same shape as Asher's task helper. - src/utils/authorization_shortcuts.py: DAuthorizedId routes AgentexResourceType.api_key through the wrap. (DAuthorizedName isn't used for api_keys; the name lookup is (agent_id, name, api_key_type), not a single globally-unique path param — the route handlers call the collapse helper inline instead.) - src/api/routes/agent_api_keys.py: * POST: explicit agent.update on parent (no api_key resource yet). * GET list: DAuthorizedResourceIds + filter; None passes through. * GET /name/{name}: inline collapse helper. * GET /{id}: DAuthorizedId(api_key, read). * DELETE /{id}: DAuthorizedId(api_key, delete). Two-factor via SpiceDB schema (api_key.delete expands to parent_agent.update); no second route-layer check. * DELETE /name/{api_key_name}: inline collapse helper. - tests/unit/api/test_agent_api_keys_authz.py (new): 12 tests, all pass. Stacked on dhruv/agx1-272-agent-api-keys-dual-write (PR A). Does NOT touch dual-write logic. Does NOT modify agentex-auth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dm36
added a commit
that referenced
this pull request
May 28, 2026
…se and two-factor mutations Mirrors AGX1-275 (PR #249) for agent_api_keys. Wires Spark AuthZ checks into every api_key route, collapses denials to 404 (so name/id probes can't distinguish "present in another tenant" from "absent"), and relies on SpiceDB's transitive expansion of api_key.{update,delete} (= editor & parent_agent->update & tenant_gate) for two-factor mutations rather than issuing two explicit checks at the route layer. - src/utils/agent_api_key_authorization.py (new): _check_api_key_or_collapse_to_404 — catches AuthorizationError, raises ItemDoesNotExist. Same shape as Asher's task helper. - src/utils/authorization_shortcuts.py: DAuthorizedId routes AgentexResourceType.api_key through the wrap. (DAuthorizedName isn't used for api_keys; the name lookup is (agent_id, name, api_key_type), not a single globally-unique path param — the route handlers call the collapse helper inline instead.) - src/api/routes/agent_api_keys.py: * POST: explicit agent.update on parent (no api_key resource yet). * GET list: DAuthorizedResourceIds + filter; None passes through. * GET /name/{name}: inline collapse helper. * GET /{id}: DAuthorizedId(api_key, read). * DELETE /{id}: DAuthorizedId(api_key, delete). Two-factor via SpiceDB schema (api_key.delete expands to parent_agent.update); no second route-layer check. * DELETE /name/{api_key_name}: inline collapse helper. - tests/unit/api/test_agent_api_keys_authz.py (new): 12 tests, all pass. Stacked on dhruv/agx1-272-agent-api-keys-dual-write (PR A). Does NOT touch dual-write logic. Does NOT modify agentex-auth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
asherfink
added a commit
that referenced
this pull request
May 28, 2026
…n creation Adds two nullable creator-audit columns to the `tasks` table — `creator_user_id` and `creator_service_account_id` — populated from the request principal in `AgentTaskService.create_task`. A CHECK constraint `ck_tasks_at_most_one_creator` enforces that at most one of the two is set; partial indexes back future "tasks created by X" lookups. Online migration: the CHECK is added `NOT VALID` then `VALIDATE`d separately so the brief ACCESS EXCLUSIVE lock doesn't have to wait on an existence scan. `tasks` is a high-write table; a vanilla CHECK addition would queue behind in-flight transactions and block readers until released. Indexes use `CREATE INDEX CONCURRENTLY` inside `autocommit_block`. Best-effort attribution: tasks created outside an HTTP request context (Temporal activities, background workers, any path that constructs `AgentTaskService` without `request.state.principal_context`) leave both columns NULL. The CHECK constraint allows both-NULL, and an integration test exercises the no-resolvable-creator path. These columns are how the AGX1-291 operator runbook identifies orphan rows for backfill when the dual-write call sites added in the next commit fail under load. Part of the AGX1-264 stack: scaleapi/scaleapi NEW2 (per-account FF endpoint) → scaleapi/agentex#353 (agentex-auth routing + cancel) → this PR → #249 (per-RPC route migration). Two commits land together in #246; this one is the schema/audit change and is independent of the dual-write call sites.
asherfink
added a commit
that referenced
this pull request
May 28, 2026
…TE flag
Wires `register_resource` / `deregister_resource` into
`AgentTaskService.create_task` / `delete_task`, gated by a new
`FGAC_TASKS_DUAL_WRITE` env-var (off by default; resolved at
DI-resolve time so mid-process flips are intentionally invisible —
rollout assumes a redeploy cycles pods).
- `create_task`: after the Postgres row persists, register the
task in the authorization graph with `parent=agent` so the
tenant + owner + parent_agent tuples are written atomically.
- `delete_task`: pre-resolves the task id by name before the
Postgres delete (lookup-after-delete would race), then
deregisters once the row is gone. The name-lookup
`ItemDoesNotExist` is swallowed so the subsequent `delete()`
surfaces its own native error — flipping the flag must not
change the error contract for missing tasks.
- Both call sites share `_dual_write_with_retry(op_name, do_call,
task_id)`, which retries transient
`AuthenticationServiceUnavailableError` /
`AuthenticationGatewayError` with exponential backoff + jitter
(3 retries → 4 attempts max). Mirrors
`agents_acp_use_case.grant_with_retry`, but with no `fail_task`
fallback: the Postgres row is the durable record and orphan auth
tuples are preferable to losing the task. The AGX1-291 operator
runbook covers backfill using the creator-audit columns added in
the parent commit.
- Emits `task_fgac_dual_write.{attempt,success,retry,failure}`
statsd counters (tagged `op:register|deregister` and
`exception_class` on failure) — the rollout signal for the
FGAC_TASKS_DUAL_WRITE flip dashboard.
The `Port` interface gains `register_resource` /
`deregister_resource`, and the agentex-auth proxy adapter calls
`POST /v1/authz/register` and `POST /v1/authz/deregister`. The
endpoints themselves already live on agentex-auth `main` via #354;
per-account routing across them is set by scaleapi/agentex#353.
Part of the AGX1-264 stack: scaleapi/scaleapi NEW2
(per-account FF endpoint) → scaleapi/agentex#353 (agentex-auth
routing + cancel) → this PR → #249 (per-RPC
route migration).
7841696 to
fcce135
Compare
6 tasks
asherfink
added a commit
that referenced
this pull request
May 28, 2026
…n creation Adds two nullable creator-audit columns to the `tasks` table — `creator_user_id` and `creator_service_account_id` — populated from the request principal in `AgentTaskService.create_task`. A CHECK constraint `ck_tasks_at_most_one_creator` enforces that at most one of the two is set; partial indexes back future "tasks created by X" lookups. Online migration: the CHECK is added `NOT VALID` then `VALIDATE`d separately so the brief ACCESS EXCLUSIVE lock doesn't have to wait on an existence scan. `tasks` is a high-write table; a vanilla CHECK addition would queue behind in-flight transactions and block readers until released. Indexes use `CREATE INDEX CONCURRENTLY` inside `autocommit_block`. Best-effort attribution: tasks created outside an HTTP request context (Temporal activities, background workers, any path that constructs `AgentTaskService` without `request.state.principal_context`) leave both columns NULL. The CHECK constraint allows both-NULL, and an integration test exercises the no-resolvable-creator path. These columns are how the AGX1-291 operator runbook identifies orphan rows for backfill when the dual-write call sites added in the next commit fail under load. Part of the AGX1-264 stack: scaleapi/scaleapi#145000 (FGAC_AGENTEX_AUTH_SPARK flag) -> scaleapi/scaleapi sgp-agentex-cancel-enum (cancel enum on SGP backend) -> scaleapi/agentex#353 (agentex-auth per-account routing + cancel) -> this PR -> #249 (per-RPC route migration). Two commits land together in #246; this one is the schema/audit change and is independent of the dual-write call sites in the next commit.
asherfink
added a commit
that referenced
this pull request
May 28, 2026
…TE flag
Wires `register_resource` / `deregister_resource` into
`AgentTaskService.create_task` / `delete_task`, gated by a new
`FGAC_TASKS_DUAL_WRITE` env-var (off by default; resolved at
DI-resolve time so mid-process flips are intentionally invisible —
rollout assumes a redeploy cycles pods).
- `create_task`: after the Postgres row persists, register the
task in the authorization graph with `parent=agent` so the
tenant + owner + parent_agent tuples are written atomically.
- `delete_task`: pre-resolves the task id by name before the
Postgres delete (lookup-after-delete would race), then
deregisters once the row is gone. The name-lookup
`ItemDoesNotExist` is swallowed so the subsequent `delete()`
surfaces its own native error — flipping the flag must not
change the error contract for missing tasks.
- Both call sites share `_dual_write_with_retry(op_name, do_call,
task_id)`, which retries transient
`AuthenticationServiceUnavailableError` /
`AuthenticationGatewayError` with exponential backoff + jitter
(3 retries -> 4 attempts max). Mirrors
`agents_acp_use_case.grant_with_retry`, but with no `fail_task`
fallback: the Postgres row is the durable record and orphan auth
tuples are preferable to losing the task. The AGX1-291 operator
runbook covers backfill using the creator-audit columns added in
the parent commit.
- Emits `task_fgac_dual_write.{attempt,success,retry,failure}`
statsd counters (tagged `op:register|deregister` and
`exception_class` on failure) — the rollout signal for the
FGAC_TASKS_DUAL_WRITE flip dashboard.
The `Port` interface gains `register_resource` /
`deregister_resource`, and the agentex-auth proxy adapter calls
`POST /v1/authz/register` and `POST /v1/authz/deregister`. The
endpoints themselves already live on agentex-auth `main` via #354;
per-account routing across them is set by scaleapi/agentex#353.
Part of the AGX1-264 stack: scaleapi/scaleapi#145000
(FGAC_AGENTEX_AUTH_SPARK flag) -> scaleapi/scaleapi
sgp-agentex-cancel-enum (cancel enum on SGP backend) ->
scaleapi/agentex#353 (agentex-auth per-account routing + cancel) ->
this PR -> #249 (per-RPC route migration).
fcce135 to
3dfd5cb
Compare
dm36
added a commit
that referenced
this pull request
May 29, 2026
…se and two-factor mutations Mirrors AGX1-275 (PR #249) for agent_api_keys. Wires Spark AuthZ checks into every api_key route, collapses denials to 404 (so name/id probes can't distinguish "present in another tenant" from "absent"), and relies on SpiceDB's transitive expansion of api_key.{update,delete} (= editor & parent_agent->update & tenant_gate) for two-factor mutations rather than issuing two explicit checks at the route layer. - src/utils/agent_api_key_authorization.py (new): _check_api_key_or_collapse_to_404 — catches AuthorizationError, raises ItemDoesNotExist. Same shape as Asher's task helper. - src/utils/authorization_shortcuts.py: DAuthorizedId routes AgentexResourceType.api_key through the wrap. (DAuthorizedName isn't used for api_keys; the name lookup is (agent_id, name, api_key_type), not a single globally-unique path param — the route handlers call the collapse helper inline instead.) - src/api/routes/agent_api_keys.py: * POST: explicit agent.update on parent (no api_key resource yet). * GET list: DAuthorizedResourceIds + filter; None passes through. * GET /name/{name}: inline collapse helper. * GET /{id}: DAuthorizedId(api_key, read). * DELETE /{id}: DAuthorizedId(api_key, delete). Two-factor via SpiceDB schema (api_key.delete expands to parent_agent.update); no second route-layer check. * DELETE /name/{api_key_name}: inline collapse helper. - tests/unit/api/test_agent_api_keys_authz.py (new): 12 tests, all pass. Stacked on dhruv/agx1-272-agent-api-keys-dual-write (PR A). Does NOT touch dual-write logic. Does NOT modify agentex-auth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
asherfink
added a commit
that referenced
this pull request
May 29, 2026
…TE flag
Wires `register_resource` / `deregister_resource` into
`AgentTaskService.create_task` / `delete_task`, gated by a new
`FGAC_TASKS_DUAL_WRITE` env-var (off by default; resolved at
DI-resolve time so mid-process flips are intentionally invisible —
rollout assumes a redeploy cycles pods).
- `create_task`: after the Postgres row persists, register the
task in the authorization graph with `parent=agent` so the
tenant + owner + parent_agent tuples are written atomically.
- `delete_task`: pre-resolves the task id by name before the
Postgres delete (lookup-after-delete would race), then
deregisters once the row is gone. The name-lookup
`ItemDoesNotExist` is swallowed so the subsequent `delete()`
surfaces its own native error — flipping the flag must not
change the error contract for missing tasks.
- Both call sites share `_dual_write_with_retry(op_name, do_call,
task_id)`, which retries transient
`AuthenticationServiceUnavailableError` /
`AuthenticationGatewayError` with exponential backoff + jitter
(3 retries -> 4 attempts max). Mirrors
`agents_acp_use_case.grant_with_retry`, but with no `fail_task`
fallback: the Postgres row is the durable record and orphan auth
tuples are preferable to losing the task. The AGX1-291 operator
runbook covers backfill using the creator-audit columns added in
the parent commit.
- Emits `task_fgac_dual_write.{attempt,success,retry,failure}`
statsd counters (tagged `op:register|deregister` and
`exception_class` on failure) — the rollout signal for the
FGAC_TASKS_DUAL_WRITE flip dashboard.
The `Port` interface gains `register_resource` /
`deregister_resource`, and the agentex-auth proxy adapter calls
`POST /v1/authz/register` and `POST /v1/authz/deregister`. The
endpoints themselves already live on agentex-auth `main` via #354;
per-account routing across them is set by scaleapi/agentex#353.
Part of the AGX1-264 stack: scaleapi/scaleapi#145000
(FGAC_AGENTEX_AUTH_SPARK flag) -> scaleapi/scaleapi
sgp-agentex-cancel-enum (cancel enum on SGP backend) ->
scaleapi/agentex#353 (agentex-auth per-account routing + cancel) ->
this PR -> #249 (per-RPC route migration).
dm36
added a commit
that referenced
this pull request
May 29, 2026
Per Asher's review: the previous commit (5deda18) wrapped every direct-resource DAuthorizedId/DAuthorizedQuery check in AuthorizationError -> ItemDoesNotExist, which flipped 403 to 404 for every direct-resource route service-wide. That diverges from #249 (tasks) and #255 (task-children) which keep direct-resource denials as 403. Restore the else branch to a bare authorization.check() in both helpers. Only child-resource branches (TaskChildResourceType, AgentChildResourceType) collapse to 404, which is the intended scope: the parent-resolution path already loads the child to find its parent, so 404 is the natural shape. Update the events list test to assert 403 for an unauthorized agent_id (consistent with the direct-resource convention). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dm36
added a commit
that referenced
this pull request
May 29, 2026
…ceType Per Asher's review: the AgentChildResourceType branch in DAuthorizedId/ DAuthorizedQuery added an extra parent-resolution hop and a new resource- type family compared to #249 (tasks) and #255 (task-children), both of which only special-case task/task-children. Mirror #249's shape exactly: - Add agent_authorization.check_agent_or_collapse_to_404, parallel to task_authorization.check_task_or_collapse_to_404 from #249. Same docstring rationale (no tenant column on AgentORM, denial reason not discriminable, so collapse to 404 to prevent cross-tenant existence leak; TODO references AGX1-290). - Delete AgentChildResourceType enum. - Delete _get_parent_agent_id and the elif branch in both shortcuts. - Drop DEventRepository injection from the shortcuts (was only used by the deleted branch). - Rewrite get_event to do event_use_case.get(...) -> agent check inline, using the helper. Identical wire behavior (single /v1/authz/check on the parent agent, denial -> 404) but no new abstraction. Net: shortcuts file is now structurally identical to main; events.py follows the same pattern Harvey/Asher established for direct/child task routes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dm36
added a commit
that referenced
this pull request
May 29, 2026
Per Asher's review on the task-child branch: same point as the direct- resource else — this PR shouldn't be modifying the task-child collapse behavior either, since #249 owns that change (via check_task_or_collapse_to_404). Revert the task-child branch in DAuthorizedId/DAuthorizedQuery back to main's shape: bare authorization.check(), no inline try/except. Now this PR's diff against main is purely subtractive on authorization_shortcuts.py: remove event from TaskChildResourceType (it no longer fits — events delegate to the parent agent, not the parent task) and the cascading cleanup of DEventRepository injection. No behavioral changes to the task-child path. #249 will apply on top cleanly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dm36
added a commit
that referenced
this pull request
May 29, 2026
Mirror Harvey's #255 inline pattern exactly in the task-child branch of DAuthorizedId/DAuthorizedQuery — same imports, same comment, same try/except shape. Direct-resource else branch stays as plain check (403), matching both #249 and #255. Net diff vs main is consistent with #255's shape: collapse for task-children, plain check for direct resources. Only delta-from-#255 is this PR's own scope (removing event from TaskChildResourceType and DEventRepository injection, since events delegate to the parent agent instead). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
harvhan
approved these changes
May 29, 2026
|
|
||
|
|
||
| async def check_task_or_collapse_to_404( | ||
| authorization, |
Author
There was a problem hiding this comment.
done, typed it as AuthorizationService
Rewires the operation literal sent to agentex-auth on task RPC routes so each RPC checks the permission that actually matches its side effect, instead of using `execute` everywhere: - `MESSAGE_SEND` / `EVENT_SEND` -> `update` - `TASK_CANCEL` -> `cancel` - `TASK_CREATE` stays `create` - Unknown `AgentRPCMethod` values now raise `NotImplementedError` rather than falling through authz-free (defense-in-depth: a new RPC must be explicitly wired before it can dispatch). The same `execute -> update` swap is applied across `messages.py`, `checkpoints.py`, and `states.py` so the editor role can perform routine mutations without needing owner. The task SpiceDB schema defines `permission update = (editor + owner) & internal_tenant_gate`, so leaving these on `execute` (owner-only) would lock editors out of normal flows. Adds `check_task_or_collapse_to_404` in `src/utils/task_authorization.py` and routes every task-resource denial path through it: path id, query id, body id, and the name surface in `authorization_shortcuts.py`. `tasks.name` is globally unique, so a 403/404 split on the name route would let any authenticated caller probe the whole system for task existence — collapsing both denial cases into 404 closes that leak at the cost of an in-tenant UX regression on permission-gap updates (tracked under AGX1-290). The `MESSAGE_SEND` task-name branch is restructured to `try/else`: a denied update on an existing task must surface as 404 and NOT fall through to the create check, which would promote "denied update" into create access. Cross-repo wire dependency: the `update` and `cancel` literals must resolve against the existing OWNER grant in SGP's task permission schema before this deploys. `update` is already in SGP's `AgentexOperation` enum; `cancel` is added by scaleapi/scaleapi sgp-agentex-cancel-enum. Held behind that PR shipping everywhere. Part of the AGX1-264 stack: scaleapi/scaleapi#145000 (FGAC_AGENTEX_AUTH_SPARK flag) -> scaleapi/scaleapi sgp-agentex-cancel-enum (cancel enum on SGP backend) -> scaleapi/agentex#353 (agentex-auth per-account routing + cancel) -> #246 (task FGAC dual-write + audit columns) -> this PR.
3dfd5cb to
3009275
Compare
jenniechung
approved these changes
Jun 1, 2026
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related work
Parent epic: AGX1-264 — per-task FGAC. Follow-ups bundled in AGX1-291.
This change is part of a 5-PR stack across 3 repos. Merge order:
scaleapi/scaleapi#144783 (sgp-authz 0.7.1)✅ already merged → scaleapi/scaleapi#145000 + scaleapi/scaleapi#145044 + scaleapi/agentex#353 → #246 → this PR.scaleapi/scaleapi#144783✅ mergedAction.CANCELFGAC_AGENTEX_AUTH_SPARKflagcancelto SGP'sAgentexOperationenum + role mapcancelopLast in the stack — this is the route-side change that actually exercises the permissions written by #246.
Summary
AuthorizedOperationTypeinstead of usingexecuteeverywhere:MESSAGE_SEND/EVENT_SEND→update,TASK_CANCEL→cancel,TASK_CREATEstayscreate.execute → updateswap acrossmessages.py,checkpoints.py, andstates.pyso the editor role can send messages and update checkpoints/state without needing owner. The task SpiceDB schema definespermission update = (editor + owner) & internal_tenant_gate, so leaving these onexecute(owner-only) would lock editors out of routine mutations.ItemDoesNotExist) across all surfaces — path id, query id, body id, and name routes — so callers can no longer distinguish "task present in another tenant" from "task absent" by comparing 403 vs 404.src/utils/task_authorization.py, reused from both the FastAPI dep factories and the RPC authorize hook.AgentRPCMethodvalues now raiseNotImplementedErrorrather than falling through authz-free — defense-in-depth so a new RPC must be explicitly wired before it can dispatch.What changed
src/utils/task_authorization.py(new):check_task_or_collapse_to_404(authorization, task_id, operation)— the shared wrap.src/utils/authorization_shortcuts.py:DAuthorizedId/DAuthorizedQuery/DAuthorizedBodyIdroute task checks through the wrap; their inner deps no longer take atask_repository(parameter was unused).DAuthorizedNamenow applies the wrap whenresource_type == AgentexResourceType.task— previously the name surface leaked 403 vs 404 becausetasks.nameis globally unique, so a probe checked the whole system rather than a single tenant.src/api/routes/agents.py_authorize_rpc_request: each task-resource branch routes through the wrap. TheMESSAGE_SENDblock withtask_nameis restructured to atry/elseshape so a denied-update on an existing task surfaces as 404 (it must NOT fall through to the create-fallbackexcept— that would silently promote a denied-update into a create check, which is a privilege escalation footgun). The unknown-method case (case _:) now raisesNotImplementedError.src/api/routes/messages.py/checkpoints.py/states.py:execute → updatefor routine mutations.TASK_CREATEand the wildcardtask("*")checks intentionally untouched.AGX1-275 list deliverable — already covered
The AGX1-275 list-filtering deliverable (only return tasks the caller can read) is already met by existing infrastructure that this PR does not change:
DAuthorizedResourceIds(AgentexResourceType.task)atsrc/utils/authorization_shortcuts.py:130resolves the per-principal accessible task id set.src/api/routes/tasks.py:82, which intersects user filters with the accessible set before hitting the repository.No additional code is needed here — the list surface is already authorized end-to-end.
Pre-merge wire dependency
The
execute → updateswap changes the operation literal that hits agentex-auth'scheckendpoint. The SGP gateway forwards the literal verbatim, so the SGP permissions backend must resolve bothupdateandcancelagainst the existing OWNER grant before this deploys:updateis already in SGP'sAgentexOperationenum onmaster— no work needed.cancelis added by scaleapi/scaleapi#145044. This PR is held until that PR ships everywhere.For Spark-routed accounts the dependency chain is scaleapi/agentex#353 + sgp-authz 0.7.1 (already merged) → ResourceRole.OWNER →
Action.CANCEL, all of which are in place upstream of this PR.Tests
tests/unit/api/test_tasks_authz.py— 17 tests pass:TestPerRpcOperationRoutingtests (incl.MESSAGE_SENDcreate-fallback preserved through the restructure).TestCheckTaskOrCollapseTo404tests (allow + denied-collapses-to-404).TestDAuthorizedBodyIdTaskWraptests.TestDAuthorizedNameTaskWraptests (denied-task → 404, allow returns name, agent path unaffected).cancel → "cancel"(mirrors agentex-auth's).Out of scope / follow-ups (tracked in AGX1-291)
/agents/name/{agent_name}has the same leak shape — agent FGAC is outside AGX1-264.Greptile Summary
This PR is the final route-side piece of the AGX1-264 per-task FGAC stack. It rewires
_authorize_rpc_requestto use operation-specific literals (updatefor MESSAGE_SEND/EVENT_SEND,cancelfor TASK_CANCEL), collapses every task-resource authorization denial into a 404 (via a sharedcheck_task_or_collapse_to_404helper), and guards the unknown-method case withNotImplementedErrorso new RPCs cannot slip through authorization-free.src/utils/task_authorization.py(new):check_task_or_collapse_to_404wraps anyAuthorizationErroron a task resource intoItemDoesNotExist, eliminating the cross-tenant existence leak thattasks.name(globally unique) would otherwise expose via 403 vs 404 divergence.src/utils/authorization_shortcuts.py: All four dep factories (DAuthorizedId,DAuthorizedQuery,DAuthorizedBodyId,DAuthorizedName) now route task-type resources through the collapse helper; theMESSAGE_SEND-with-task_namebranch uses atry/elseshape so a denied-update can never fall through to the create-fallback check (privilege escalation footgun addressed).src/api/routes/{messages,checkpoints}.py:execute → updateswap for routine mutations so theeditorrole can send messages and update checkpoints/state without needingowner.Confidence Score: 4/5
Safe to merge once the upstream wire dependency (scaleapi/scaleapi#145044 shipping cancel into SGP's AgentexOperation enum) is confirmed live; the execute→update swap for MESSAGE_SEND/EVENT_SEND is already safe per SGP master.
The routing logic, collapse helper, and try/else privilege-escalation guard are all correct and well-tested. The one remaining gate — cancel reaching every SGP region before this deploys — is explicitly documented in the PR description as a hold condition; no in-repo verification of that live state exists. The change is otherwise clean and the score reflects that single external dependency rather than any defect in the code itself.
No files contain defects. The pre-merge wire dependency on scaleapi/scaleapi#145044 is the only open gate, documented in the PR description and in the existing review comment on agents.py.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD RPC["Incoming AgentRPC Request"] --> MATCH{"match request.method"} MATCH -->|TASK_CREATE| CREATE_CHECK["authorization.check\ntask(*) · create"] MATCH -->|MESSAGE_SEND| MSG_ID{"task_id present?"} MATCH -->|EVENT_SEND| EVT_ID{"task_id present?"} MATCH -->|TASK_CANCEL| CXL_ID{"task_id present?"} MATCH -->|unknown| NIE["raise NotImplementedError\n→ 500"] MSG_ID -->|yes| MSG_WRAP_ID["check_task_or_collapse_to_404\ntask_id · update"] MSG_ID -->|no, task_name| MSG_LOOKUP["task_service.get_task(name)"] MSG_ID -->|no task_id/name| MSG_CREATE["authorization.check\ntask(*) · create"] MSG_LOOKUP -->|ItemDoesNotExist| MSG_CREATE MSG_LOOKUP -->|found| MSG_WRAP_NAME["check_task_or_collapse_to_404\nexisting_task.id · update"] EVT_ID -->|yes| EVT_WRAP_ID["check_task_or_collapse_to_404\ntask_id · update"] EVT_ID -->|no, task_name| EVT_LOOKUP["task_service.get_task(name)"] EVT_ID -->|no task_id/name| EVT_ERR["raise ValueError → 400"] EVT_LOOKUP --> EVT_WRAP_NAME["check_task_or_collapse_to_404\nexisting_task.id · update"] CXL_ID -->|yes| CXL_WRAP_ID["check_task_or_collapse_to_404\ntask_id · cancel"] CXL_ID -->|no, task_name| CXL_LOOKUP["task_service.get_task(name)"] CXL_ID -->|no task_id/name| CXL_ERR["raise ValueError → 400"] CXL_LOOKUP --> CXL_WRAP_NAME["check_task_or_collapse_to_404\nexisting_task.id · cancel"] subgraph WRAP["check_task_or_collapse_to_404"] direction TB AZ["authorization.check(task, op)"] -->|AuthorizationError| FOLD["raise ItemDoesNotExist → 404"] AZ -->|ok| PASS["return (allow)"] end MSG_WRAP_ID & MSG_WRAP_NAME & EVT_WRAP_ID & EVT_WRAP_NAME & CXL_WRAP_ID & CXL_WRAP_NAME --> WRAPReviews (4): Last reviewed commit: "feat(AGX1-275): per-RPC task permission ..." | Re-trigger Greptile