Skip to content

feat(adagents): fetch_agent_authorizations_from_directory for AAO inverse lookup#769

Merged
bokelley merged 2 commits into
mainfrom
bokelley/fetch-agent-authorizations-from-directory
May 21, 2026
Merged

feat(adagents): fetch_agent_authorizations_from_directory for AAO inverse lookup#769
bokelley merged 2 commits into
mainfrom
bokelley/fetch-agent-authorizations-from-directory

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

  • Adds fetch_agent_authorizations_from_directory(agent_url, *, directory_url, since=None, timeout=10.0, client=None) next to the existing per-publisher fetch_agent_authorizations. Calls the AAO directory's GET /v1/agents/{agent_url}/publishers endpoint (adcp#4823 / #4828) and returns a typed AgentAuthorizationsDirectoryResult parsed against the real wire body via Pydantic .model_validate().
  • Wires the contract per the issue: 404 → empty publishers=[] (a directory that hasn't indexed the agent is a normal answer, not an exception); timeouts → AdagentsTimeoutError; malformed or schema-noncompliant → AdagentsValidationError.
  • Bumps the schema pin to 3.1.0-beta.2 so schemas/cache/ includes aao/agent-publishers.json (the new schema landed in adcp#4828).
  • Re-exports fetch_agent_authorizations_from_directory, AgentAuthorizationsDirectoryResult, and the supporting DirectoryPublisherEntry / DirectoryDiscoveryMethod / DirectoryEdgeStatus aliases from adcp/__init__.py.

Design notes

  • Discovery, not authorization. The directory's answer tells callers where to look — they should still verify each publisher_domain against its own adagents.json (via fetch_adagents) before treating the edge as trusted. This is the same trust-root contract the schema doc spells out.
  • Same SSRF posture as publisher-side fetches. directory_url is HTTPS-only, runs through _validate_redirect_url + _dns_validate_host, and the response body is streamed with the existing 5 MiB cap. Redirects are not followed.
  • since is wire-passthrough. The schema doesn't pin a semantic — it can be a next_cursor from the prior page, or an RFC 3339 timestamp tied to directory_indexed_at. We forward it verbatim as ?since=... and let the directory interpret.

Type-regen status

make regenerate-schemas is blocked at the v3.1.0-beta.2 bundle — datamodel-code-generator mis-resolves ../enums/channels.json when the resolution chain originates at a depth-0 schema. The root-level adagents.json now transitively references the new core/product-format-declaration.json, which itself uses ../enums/channels.json for applies_to_channels. The codegen treats the ../ as relative to the root rather than to the depth-1 file and reports not found: .schema_temp/../enums/channels.json. Pinned v3.0 schemas regen cleanly; v3.1.0-beta.2 does not. The hand-written Pydantic models in this PR are scoped to the new endpoint so we can ship the function without unblocking full regen first — that fix is tracked separately.

Tests

Eight new wire-level tests under TestFetchAgentAuthorizationsFromDirectory, all using httpx.MockTransport (no MagicMock of the client):

  • test_happy_path_parses_into_pydantic — full body round-trips through AgentAuthorizationsDirectoryResult.model_validate(), includes both direct and adagents_authoritative discovery rows.
  • test_404_returns_empty_publishers — explicit coverage of the 404 → empty contract.
  • test_since_cursor_passes_through_as_query_string — verifies ?since= reaches the wire.
  • test_timeout_raises_adagents_timeout_errorhttpx.ReadTimeoutAdagentsTimeoutError.
  • test_malformed_json_raises_validation_error — non-JSON 200 body.
  • test_schema_mismatch_raises_validation_error — missing required fields on the envelope and a publisher entry.
  • test_non_https_directory_url_rejected — SSRF gate fires before any I/O.
  • test_non_200_non_404_raises_validation_error — 5xx surfaces as AdagentsValidationError, not silent empty.

Linked

Test plan

  • CI green on Python 3.10–3.13
  • ruff check, mypy src/adcp/adagents.py, and pytest tests/test_adagents.py pass locally (verified)
  • salesagent#511 picks up the new function in its discovery flow

🤖 Generated with Claude Code

…erse lookup (closes #746)

Adds a client function for the AAO directory's `GET /v1/agents/{agent_url}/publishers`
endpoint (adcp#4823 / #4828) — the inverse-lookup path that returns the set
of publishers whose adagents.json authorizes a given agent_url. Result is a
typed `AgentAuthorizationsDirectoryResult` (Pydantic, validated against the
real wire body). A 404 from the directory is the "not indexed" answer and
surfaces as a result with `publishers=[]`; timeouts raise
`AdagentsTimeoutError`; malformed or schema-noncompliant responses raise
`AdagentsValidationError`.

The directory's answer is *discovery*, not authorization — callers should
still verify each returned `publisher_domain` via `fetch_adagents` before
trusting the edge. Same SSRF gates apply (HTTPS only, DNS pre-check,
private/reserved address ban, 5 MiB body cap, no redirect follow).

Also bumps the schema pin to 3.1.0-beta.2 so `schemas/cache/` includes
`aao/agent-publishers.json`. Full Pydantic regen is deferred —
datamodel-code-generator mis-resolves `../enums/channels.json` when the
chain originates at a depth-0 schema (root-level `adagents.json` now
transitively references the new `core/product-format-declaration.json`,
which itself uses `../enums/...`). The hand-written models in this PR are
scoped to the new endpoint; unblocking full regen is tracked separately.

Tests use `httpx.MockTransport` to exercise the real wire shape end-to-end
and assert against `.model_validate()` on the Pydantic classes — covering
happy path, 404 → empty, `since` cursor passthrough, timeout, malformed
JSON, schema-mismatch, non-HTTPS guard, and 5xx surface.

Refs salesagent #511.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes May 21, 2026
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean additive ship-it. Right shape — reuses the existing SSRF stack, mirrors the fetch_adagents error contract, and 404-as-empty maps cleanly to the schema's permissive directory_indexed_at: null carve-out at agent-publishers.json:18-23.

Things I checked

  • SSRF posture inherited, not worsened. _validate_redirect_url + _dns_validate_host + _stream_capped(follow_redirects=False) all wired into the new path (adagents.py:1890, :1897-1899, :1905). quote(agent_url, safe='') at :1892 encodes /, :, @, \r, \n — no path-traversal or host-rewrite via a hostile agent_url. quote(since, safe='') at :1894 blocks request-splitting through the cursor. security-reviewer: no High or Medium findings.
  • Wire shape. AgentAuthorizationsDirectoryResult and DirectoryPublisherEntry mirror schemas/cache/3.1.0-beta.2/aao/agent-publishers.json. DirectoryDiscoveryMethod Literal at :1786-1791 matches the schema enum at :61-66. extra='ignore' via AdCPBaseModel against additionalProperties: false is the deliberate forward-compat posture, consistent with the rest of the SDK — ADCP_STRICT_VALIDATION=1 gives strict mode for CI.
  • Public-API additive. feat(adagents): is the correct conventional-commit prefix. New exports alphabetized in __init__.py. No collisions, no removals, no signature changes on existing surfaces.
  • Generated types untouched. src/adcp/types/generated_poc/** and src/adcp/types/_generated.py are not in the diff. The 666-file count is almost entirely cached schema JSON, not generated Python — the layering rule from CLAUDE.md holds.
  • 8 wire-level tests using httpx.MockTransport — no MagicMock(httpx.AsyncClient). Happy path round-trips through .model_validate(). 404 → empty path is explicitly covered. 5xx surfaces as AdagentsValidationError, not silent empty.

Follow-ups (non-blocking — file as issues)

  1. ADCP_VERSION bump to 3.1.0-beta.2 without full type regen. ad-tech-protocol-expert: sound-with-caveats. The SDK advertises 3.1.0-beta.2 but generated_poc/ is still 3.0-vintage. The hand-written models in this PR cover only the AAO endpoint — every other 3.1 task's types are stale relative to the version pin, and the failure mode is silent (extra='ignore' drops on renamed fields). The PR rationale is sound — datamodel-code-generator mis-resolves ../enums/channels.json when the chain originates at depth-0 — but ship a CHANGELOG / known-limitations note that names the deferred regen and the affected schema chains, or stage the version bump behind the codegen unblock. Notable that the schema pin advertises 3.1.0-beta.2 before the types do.
  2. Conditional-required manager_domain not enforced. Schema at agent-publishers.json:104-127 has an allOf/if/then that requires manager_domain when discovery_method ∈ {authoritative_location, adagents_authoritative, ads_txt_managerdomain}. The model at adagents.py:1801 declares manager_domain: str | None = None unconditionally — a non-conforming directory row silently lands as manager_domain=None. Add a @model_validator(mode='after') on DirectoryPublisherEntry to mirror the allOf, or document the deliberate loosening in the docstring. The ads_txt_managerdomain path is the weakest discovery method — manager_domain is the only positive cross-check, per the schema description at :60.
  3. publishers: list = Field(default_factory=list) at adagents.py:1821 silently fills a schema-required field on 200 responses. Drop the default; let Pydantic enforce. (The 404 path constructs publishers=[] directly, so the default isn't load-bearing there.)
  4. except Exception at adagents.py:1944 is too broad. Narrow to pydantic.ValidationError. Today's net effect is identical — callers still get AdagentsValidationError — but the broad catch will mask an AttributeError from a future model refactor as 'schema validation failed.'
  5. Test gaps that pair with #2 and #3 — no case for the conditional-manager_domain violation, no case for a 200 missing publishers. Worth adding once the model is tightened.

Minor nits (non-blocking)

  1. MAX_DIRECTORY_PAGE_BYTES comment at adagents.py:1825-1826 references MAX_POINTER_BYTESMAX_POINTER_BYTES does live at adagents.py:535, so the cross-reference is correct, but the constants aren't actually shared. Either alias one to the other or drop the prose claim and let 5 * 1024 * 1024 speak for itself.
  2. Per-test inline imports in TestFetchAgentAuthorizationsFromDirectory (every method does from adcp.adagents import ... inside the function body). Move to module top to match the rest of tests/test_adagents.py.
  3. _validate_redirect_url(f\"{base}/v1/agents/_/publishers\") at adagents.py:1890 validates a synthesized placeholder URL, not request_url. Functionally safe — _validate_redirect_url only inspects parsed.hostname and the host is identical — but a one-line comment explaining the placeholder would keep a future refactor from desyncing the two.
  4. directory_url control-char reject as defense-in-depth. _validate_publisher_domain already rejects \r, \n, \t, \\, @ at adagents.py:275-278; same trick on directory_url before urlparse would close any future transport-permissiveness regression. Today httpx + getaddrinfo block request-splitting on this path, but the SDK shouldn't lean on transport hygiene for its own contract.
  5. Function name reads as discovery-via-the-verb-authorizations (fetch_agent_authorizations_from_directory). The docstring is clear three times that the directory's answer is discovery, not authorization. Verb in the symbol name is the soft spot — adopters who skim types and see status: 'authorized' plus a function name that contains 'authorizations' can talk themselves into skipping the fetch_adagents verification step. Not a block; worth a thought before the symbol locks in via adopter usage.

Test plan note

The PR body's test plan has three unchecked items. Two are CI/local-pass placeholders with a parenthetical "verified" — fine. The third — salesagent#511 picks up the new function in its discovery flow — is a downstream-adoption checkpoint, not a precondition for shipping the primitive. Track it on the salesagent side.

Safe to merge once you've rebased onto main (PR is BEHIND). The follow-ups above are real but none of them break adopters today.

The 3.1.0-beta.2 bundle introduces spec drift well beyond this PR's
scope — `cache_scope` becomes required on product responses, new
`sponsored-intelligence` specialism + `search_brands` webhook task type,
new `validate_input_brand_claims` endpoint. Each of those needs its own
focused change (constant updates, fixture refreshes, capability surface
work); bundling them with the AAO inverse-lookup function would block
landing both.

This commit:
- Reverts ADCP_VERSION to 3.0.7 (the prior pin).
- Drops `schemas/cache/3.1.0-beta.2/` from the tree; the new
  `fetch_agent_authorizations_from_directory` works with hand-written
  Pydantic models and does not need the v3.1 bundle on disk.
- Regenerates `tests/fixtures/public_api_snapshot.json` to record the
  intentional new public exports (function + result types).

The v3.1 schema-pin bump (and the codegen `../`-resolution fix noted in
the PR body) move to a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean additive surface, sound SSRF chain, real wire-level tests against .model_validate(). Conventional feat: (not feat!:) is correct — purely additive public exports.

Things I checked

  • SSRF chain holds. quote(agent_url, safe='') at src/adcp/adagents.py:1892 encodes /, ?, #, @, :, \, %, CR/LF — no path-segment escape can flip parsed.hostname between the validated base and the actual request. _validate_redirect_url + _dns_validate_host chain mirrors the existing fetch_adagents posture. The case-sensitive startswith("https://") at :1888 rejects HTTPS:// rather than letting it through — conservative, fine.
  • follow_redirects=False inherited from _stream_capped (:1142) — correct for an authoritative directory endpoint; a 30x to a third party would bypass the SSRF gate.
  • 404 → empty is spec-mandated. Per schemas/source/aao/agent-publishers.json (adcp#4828), "Directory has never indexed any publisher referencing this agent_url" is the documented 404 contract — distinct from 200 + empty. The PR's :1918-1928 branch matches.
  • extra='ignore' on AdCPBaseModel keeps unknown wire fields forward-compatible — the right default for protocol models.
  • Public-API snapshot (tests/fixtures/public_api_snapshot.json) records the five new exports. No removals, no signature changes on existing exports — additive only.
  • Tests exercise the real wire shape via httpx.MockTransport, parse against the actual Pydantic model, and explicitly cover the 404-as-empty contract. Eight tests, no MagicMock of the client. Right call.

Follow-ups (non-blocking — file as issues)

  1. since vs cursor divergence from the spec. Per docs/aao/directory-api.mdx, ?since= is ISO 8601 only and ?cursor= is the separate pagination param. The function exposes only since and the docstring at :1854-1856 (and the PR body) tells callers to pass next_cursor back as since. That's the wrong wire field — directories that validate strictly will reject opaque cursors in since. Add a separate cursor: str | None parameter, narrow since to RFC 3339, and update the docstring.
  2. Conditional manager_domain requirement not enforced. The schema's allOf/if-then requires manager_domain non-null whenever discovery_method != "direct". The Pydantic model at :1796 lets None through for all four methods, so a malformed ads_txt_managerdomain row with manager_domain: null parses cleanly and downstream code hits the None. Add a model_validator to mirror the schema's conditional requirement.
  3. Closed DirectoryDiscoveryMethod Literal on an enum that's already expanding. The new adagents_authoritative value is itself the precedent — the prior DiscoveryMethod (:28) had three values, the new one has four. v1.x will add more. A closed Literal raises ValidationError on adopter machines pinned to older SDKs. Consider an open string with a known-values constant, or document the closed-enum discipline as intentional.
  4. Missing status and limit query params. Spec defines status (repeated, default authorized) and limit (1–1000, default 200). Adopters who want revoked tombstones currently have to drop to raw httpx.
  5. README drift at README.md:1052-1088. The "Authorization Discovery" section documents Push and Pull but not the new Directory path. One short subsection alongside the existing two would close it. AGENTS.md / llms.txt don't mention any of the family — separate gap, not regression.
  6. 404 vs 200 agent_url echo asymmetry (src/adcp/adagents.py:1923-1928 vs :1941). On 404 we fabricate agent_url=<raw caller input>; on 200 we return whatever the directory echoes (possibly canonicalized). Adopters comparing result.agent_url across calls see drift. Either document or normalize.

Minor nits (non-blocking)

  1. except Exception at :1942 is broader than every other handler in this file (siblings use (AdagentsValidationError, AdagentsTimeoutError, ...) or httpx.*). Narrow to pydantic.ValidationError (or (pydantic.ValidationError, TypeError)) for consistency.
  2. Unbounded {e} interpolation at :1943-1945. The JSON-decode path at :1937 truncates to [:200]; the Pydantic path doesn't. Pydantic v2's ValidationError.__str__ includes offending field values from the response — apply matching str(e)[:500] for log-volume parity and to bound what a hostile directory can stamp into adopter logs.
  3. quote(since, safe='') at :1894 is correct for an opaque cursor but irrelevant once since is narrowed to RFC 3339 per follow-up #1.

The schema bundle revert in the second commit was the right call — bundling unrelated v3.1 drift (cache_scope-required, new specialism, new endpoint) with the inverse-lookup function would have blocked both. Interesting choice to ship hand-written Pydantic for a wire shape the codegen can't yet produce; works here because the surface is small, but the regen unblock tracked separately is now load-bearing for everything else in v3.1.

Approving on the strength of the SSRF chain plus the wire-level test coverage. Follow-ups noted above.

@bokelley bokelley merged commit 1c4e57d into main May 21, 2026
17 checks passed
bokelley added a commit that referenced this pull request May 21, 2026
…ry inverse-lookup (#749 Part 3, adcp#4894)

Builds on #769's directory wrapper. Adds:

- include=["properties"] parameter on fetch_agent_authorizations_from_directory
  (adcp#4894). Repeated-key form (?include=properties&include=...), not
  comma-joined.
- property_ids: list[str] | None field on DirectoryPublisherEntry. None
  signals the directory did not return per-publisher IDs (count-only
  mode); a list signals the directory supports ?include=properties.
- detect_publisher_properties_divergence: compares directory inline
  resolution against per-publisher federated fetches. Full
  (publisher_domain, property_id) set-diff when property_ids is
  available; graceful fallback to count-only against older directories.
  max_concurrency=20 default semaphore caps concurrent fetches at
  managed-network scale (cafemedia ~6,800 publishers).
  sample_size=200 default keeps unbounded sweeps opt-in.
- PublisherDivergence / DivergenceReport types (Pydantic, matching
  #769's style).

Closes #749 Part 3. Part 2 superseded by #769 (which closed #746).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(adagents): fetch_agent_authorizations_from_directory — inverse lookup against AAO directory

1 participant