feat(adagents): fetch_agent_authorizations_from_directory for AAO inverse lookup#769
Conversation
…erse lookup (closes #746) Adds a client function for the AAO directory's `GET /v1/agents/{agent_url}/publishers` endpoint (adcp#4823 / #4828) — the inverse-lookup path that returns the set of publishers whose adagents.json authorizes a given agent_url. Result is a typed `AgentAuthorizationsDirectoryResult` (Pydantic, validated against the real wire body). A 404 from the directory is the "not indexed" answer and surfaces as a result with `publishers=[]`; timeouts raise `AdagentsTimeoutError`; malformed or schema-noncompliant responses raise `AdagentsValidationError`. The directory's answer is *discovery*, not authorization — callers should still verify each returned `publisher_domain` via `fetch_adagents` before trusting the edge. Same SSRF gates apply (HTTPS only, DNS pre-check, private/reserved address ban, 5 MiB body cap, no redirect follow). Also bumps the schema pin to 3.1.0-beta.2 so `schemas/cache/` includes `aao/agent-publishers.json`. Full Pydantic regen is deferred — datamodel-code-generator mis-resolves `../enums/channels.json` when the chain originates at a depth-0 schema (root-level `adagents.json` now transitively references the new `core/product-format-declaration.json`, which itself uses `../enums/...`). The hand-written models in this PR are scoped to the new endpoint; unblocking full regen is tracked separately. Tests use `httpx.MockTransport` to exercise the real wire shape end-to-end and assert against `.model_validate()` on the Pydantic classes — covering happy path, 404 → empty, `since` cursor passthrough, timeout, malformed JSON, schema-mismatch, non-HTTPS guard, and 5xx surface. Refs salesagent #511. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
LGTM. Clean additive ship-it. Right shape — reuses the existing SSRF stack, mirrors the fetch_adagents error contract, and 404-as-empty maps cleanly to the schema's permissive directory_indexed_at: null carve-out at agent-publishers.json:18-23.
Things I checked
- SSRF posture inherited, not worsened.
_validate_redirect_url+_dns_validate_host+_stream_capped(follow_redirects=False)all wired into the new path (adagents.py:1890,:1897-1899,:1905).quote(agent_url, safe='')at:1892encodes/,:,@,\r,\n— no path-traversal or host-rewrite via a hostileagent_url.quote(since, safe='')at:1894blocks request-splitting through the cursor.security-reviewer: no High or Medium findings. - Wire shape.
AgentAuthorizationsDirectoryResultandDirectoryPublisherEntrymirrorschemas/cache/3.1.0-beta.2/aao/agent-publishers.json.DirectoryDiscoveryMethodLiteral at:1786-1791matches the schema enum at:61-66.extra='ignore'viaAdCPBaseModelagainstadditionalProperties: falseis the deliberate forward-compat posture, consistent with the rest of the SDK —ADCP_STRICT_VALIDATION=1gives strict mode for CI. - Public-API additive.
feat(adagents):is the correct conventional-commit prefix. New exports alphabetized in__init__.py. No collisions, no removals, no signature changes on existing surfaces. - Generated types untouched.
src/adcp/types/generated_poc/**andsrc/adcp/types/_generated.pyare not in the diff. The 666-file count is almost entirely cached schema JSON, not generated Python — the layering rule from CLAUDE.md holds. - 8 wire-level tests using
httpx.MockTransport— noMagicMock(httpx.AsyncClient). Happy path round-trips through.model_validate(). 404 → empty path is explicitly covered. 5xx surfaces asAdagentsValidationError, not silent empty.
Follow-ups (non-blocking — file as issues)
ADCP_VERSIONbump to3.1.0-beta.2without full type regen.ad-tech-protocol-expert: sound-with-caveats. The SDK advertises 3.1.0-beta.2 butgenerated_poc/is still 3.0-vintage. The hand-written models in this PR cover only the AAO endpoint — every other 3.1 task's types are stale relative to the version pin, and the failure mode is silent (extra='ignore'drops on renamed fields). The PR rationale is sound —datamodel-code-generatormis-resolves../enums/channels.jsonwhen the chain originates at depth-0 — but ship a CHANGELOG / known-limitations note that names the deferred regen and the affected schema chains, or stage the version bump behind the codegen unblock. Notable that the schema pin advertises 3.1.0-beta.2 before the types do.- Conditional-required
manager_domainnot enforced. Schema atagent-publishers.json:104-127has anallOf/if/thenthat requiresmanager_domainwhendiscovery_method ∈ {authoritative_location, adagents_authoritative, ads_txt_managerdomain}. The model atadagents.py:1801declaresmanager_domain: str | None = Noneunconditionally — a non-conforming directory row silently lands asmanager_domain=None. Add a@model_validator(mode='after')onDirectoryPublisherEntryto mirror theallOf, or document the deliberate loosening in the docstring. Theads_txt_managerdomainpath is the weakest discovery method —manager_domainis the only positive cross-check, per the schema description at:60. publishers: list = Field(default_factory=list)atadagents.py:1821silently fills a schema-requiredfield on 200 responses. Drop the default; let Pydantic enforce. (The 404 path constructspublishers=[]directly, so the default isn't load-bearing there.)except Exceptionatadagents.py:1944is too broad. Narrow topydantic.ValidationError. Today's net effect is identical — callers still getAdagentsValidationError— but the broad catch will mask anAttributeErrorfrom a future model refactor as 'schema validation failed.'- Test gaps that pair with #2 and #3 — no case for the conditional-
manager_domainviolation, no case for a 200 missingpublishers. Worth adding once the model is tightened.
Minor nits (non-blocking)
MAX_DIRECTORY_PAGE_BYTEScomment atadagents.py:1825-1826referencesMAX_POINTER_BYTES—MAX_POINTER_BYTESdoes live atadagents.py:535, so the cross-reference is correct, but the constants aren't actually shared. Either alias one to the other or drop the prose claim and let5 * 1024 * 1024speak for itself.- Per-test inline imports in
TestFetchAgentAuthorizationsFromDirectory(every method doesfrom adcp.adagents import ...inside the function body). Move to module top to match the rest oftests/test_adagents.py. _validate_redirect_url(f\"{base}/v1/agents/_/publishers\")atadagents.py:1890validates a synthesized placeholder URL, notrequest_url. Functionally safe —_validate_redirect_urlonly inspectsparsed.hostnameand the host is identical — but a one-line comment explaining the placeholder would keep a future refactor from desyncing the two.directory_urlcontrol-char reject as defense-in-depth._validate_publisher_domainalready rejects\r,\n,\t,\\,@atadagents.py:275-278; same trick ondirectory_urlbeforeurlparsewould close any future transport-permissiveness regression. Today httpx +getaddrinfoblock request-splitting on this path, but the SDK shouldn't lean on transport hygiene for its own contract.- Function name reads as discovery-via-the-verb-
authorizations(fetch_agent_authorizations_from_directory). The docstring is clear three times that the directory's answer is discovery, not authorization. Verb in the symbol name is the soft spot — adopters who skim types and seestatus: 'authorized'plus a function name that contains 'authorizations' can talk themselves into skipping thefetch_adagentsverification step. Not a block; worth a thought before the symbol locks in via adopter usage.
Test plan note
The PR body's test plan has three unchecked items. Two are CI/local-pass placeholders with a parenthetical "verified" — fine. The third — salesagent#511 picks up the new function in its discovery flow — is a downstream-adoption checkpoint, not a precondition for shipping the primitive. Track it on the salesagent side.
Safe to merge once you've rebased onto main (PR is BEHIND). The follow-ups above are real but none of them break adopters today.
The 3.1.0-beta.2 bundle introduces spec drift well beyond this PR's scope — `cache_scope` becomes required on product responses, new `sponsored-intelligence` specialism + `search_brands` webhook task type, new `validate_input_brand_claims` endpoint. Each of those needs its own focused change (constant updates, fixture refreshes, capability surface work); bundling them with the AAO inverse-lookup function would block landing both. This commit: - Reverts ADCP_VERSION to 3.0.7 (the prior pin). - Drops `schemas/cache/3.1.0-beta.2/` from the tree; the new `fetch_agent_authorizations_from_directory` works with hand-written Pydantic models and does not need the v3.1 bundle on disk. - Regenerates `tests/fixtures/public_api_snapshot.json` to record the intentional new public exports (function + result types). The v3.1 schema-pin bump (and the codegen `../`-resolution fix noted in the PR body) move to a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
LGTM. Clean additive surface, sound SSRF chain, real wire-level tests against .model_validate(). Conventional feat: (not feat!:) is correct — purely additive public exports.
Things I checked
- SSRF chain holds.
quote(agent_url, safe='')atsrc/adcp/adagents.py:1892encodes/,?,#,@,:,\,%, CR/LF — no path-segment escape can flipparsed.hostnamebetween the validated base and the actual request._validate_redirect_url+_dns_validate_hostchain mirrors the existingfetch_adagentsposture. The case-sensitivestartswith("https://")at:1888rejectsHTTPS://rather than letting it through — conservative, fine. follow_redirects=Falseinherited from_stream_capped(:1142) — correct for an authoritative directory endpoint; a 30x to a third party would bypass the SSRF gate.- 404 → empty is spec-mandated. Per
schemas/source/aao/agent-publishers.json(adcp#4828), "Directory has never indexed any publisher referencing thisagent_url" is the documented 404 contract — distinct from200+ empty. The PR's:1918-1928branch matches. extra='ignore'onAdCPBaseModelkeeps unknown wire fields forward-compatible — the right default for protocol models.- Public-API snapshot (
tests/fixtures/public_api_snapshot.json) records the five new exports. No removals, no signature changes on existing exports — additive only. - Tests exercise the real wire shape via
httpx.MockTransport, parse against the actual Pydantic model, and explicitly cover the 404-as-empty contract. Eight tests, noMagicMockof the client. Right call.
Follow-ups (non-blocking — file as issues)
sincevscursordivergence from the spec. Perdocs/aao/directory-api.mdx,?since=is ISO 8601 only and?cursor=is the separate pagination param. The function exposes onlysinceand the docstring at:1854-1856(and the PR body) tells callers to passnext_cursorback assince. That's the wrong wire field — directories that validate strictly will reject opaque cursors insince. Add a separatecursor: str | Noneparameter, narrowsinceto RFC 3339, and update the docstring.- Conditional
manager_domainrequirement not enforced. The schema'sallOf/if-thenrequiresmanager_domainnon-null wheneverdiscovery_method != "direct". The Pydantic model at:1796letsNonethrough for all four methods, so a malformedads_txt_managerdomainrow withmanager_domain: nullparses cleanly and downstream code hits the None. Add amodel_validatorto mirror the schema's conditional requirement. - Closed
DirectoryDiscoveryMethodLiteral on an enum that's already expanding. The newadagents_authoritativevalue is itself the precedent — the priorDiscoveryMethod(:28) had three values, the new one has four. v1.x will add more. A closedLiteralraisesValidationErroron adopter machines pinned to older SDKs. Consider an open string with a known-values constant, or document the closed-enum discipline as intentional. - Missing
statusandlimitquery params. Spec definesstatus(repeated, defaultauthorized) andlimit(1–1000, default 200). Adopters who want revoked tombstones currently have to drop to raw httpx. - README drift at
README.md:1052-1088. The "Authorization Discovery" section documents Push and Pull but not the new Directory path. One short subsection alongside the existing two would close it. AGENTS.md / llms.txt don't mention any of the family — separate gap, not regression. - 404 vs 200
agent_urlecho asymmetry (src/adcp/adagents.py:1923-1928vs:1941). On 404 we fabricateagent_url=<raw caller input>; on 200 we return whatever the directory echoes (possibly canonicalized). Adopters comparingresult.agent_urlacross calls see drift. Either document or normalize.
Minor nits (non-blocking)
except Exceptionat:1942is broader than every other handler in this file (siblings use(AdagentsValidationError, AdagentsTimeoutError, ...)orhttpx.*). Narrow topydantic.ValidationError(or(pydantic.ValidationError, TypeError)) for consistency.- Unbounded
{e}interpolation at:1943-1945. The JSON-decode path at:1937truncates to[:200]; the Pydantic path doesn't. Pydantic v2'sValidationError.__str__includes offending field values from the response — apply matchingstr(e)[:500]for log-volume parity and to bound what a hostile directory can stamp into adopter logs. quote(since, safe='')at:1894is correct for an opaque cursor but irrelevant oncesinceis narrowed to RFC 3339 per follow-up #1.
The schema bundle revert in the second commit was the right call — bundling unrelated v3.1 drift (cache_scope-required, new specialism, new endpoint) with the inverse-lookup function would have blocked both. Interesting choice to ship hand-written Pydantic for a wire shape the codegen can't yet produce; works here because the surface is small, but the regen unblock tracked separately is now load-bearing for everything else in v3.1.
Approving on the strength of the SSRF chain plus the wire-level test coverage. Follow-ups noted above.
…ry inverse-lookup (#749 Part 3, adcp#4894) Builds on #769's directory wrapper. Adds: - include=["properties"] parameter on fetch_agent_authorizations_from_directory (adcp#4894). Repeated-key form (?include=properties&include=...), not comma-joined. - property_ids: list[str] | None field on DirectoryPublisherEntry. None signals the directory did not return per-publisher IDs (count-only mode); a list signals the directory supports ?include=properties. - detect_publisher_properties_divergence: compares directory inline resolution against per-publisher federated fetches. Full (publisher_domain, property_id) set-diff when property_ids is available; graceful fallback to count-only against older directories. max_concurrency=20 default semaphore caps concurrent fetches at managed-network scale (cafemedia ~6,800 publishers). sample_size=200 default keeps unbounded sweeps opt-in. - PublisherDivergence / DivergenceReport types (Pydantic, matching #769's style). Closes #749 Part 3. Part 2 superseded by #769 (which closed #746). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
fetch_agent_authorizations_from_directory(agent_url, *, directory_url, since=None, timeout=10.0, client=None)next to the existing per-publisherfetch_agent_authorizations. Calls the AAO directory'sGET /v1/agents/{agent_url}/publishersendpoint (adcp#4823 / #4828) and returns a typedAgentAuthorizationsDirectoryResultparsed against the real wire body via Pydantic.model_validate().publishers=[](a directory that hasn't indexed the agent is a normal answer, not an exception); timeouts →AdagentsTimeoutError; malformed or schema-noncompliant →AdagentsValidationError.3.1.0-beta.2soschemas/cache/includesaao/agent-publishers.json(the new schema landed in adcp#4828).fetch_agent_authorizations_from_directory,AgentAuthorizationsDirectoryResult, and the supportingDirectoryPublisherEntry/DirectoryDiscoveryMethod/DirectoryEdgeStatusaliases fromadcp/__init__.py.Design notes
publisher_domainagainst its ownadagents.json(viafetch_adagents) before treating the edge as trusted. This is the same trust-root contract the schema doc spells out.directory_urlis HTTPS-only, runs through_validate_redirect_url+_dns_validate_host, and the response body is streamed with the existing 5 MiB cap. Redirects are not followed.sinceis wire-passthrough. The schema doesn't pin a semantic — it can be anext_cursorfrom the prior page, or an RFC 3339 timestamp tied todirectory_indexed_at. We forward it verbatim as?since=...and let the directory interpret.Type-regen status
make regenerate-schemasis blocked at the v3.1.0-beta.2 bundle —datamodel-code-generatormis-resolves../enums/channels.jsonwhen the resolution chain originates at a depth-0 schema. The root-leveladagents.jsonnow transitively references the newcore/product-format-declaration.json, which itself uses../enums/channels.jsonforapplies_to_channels. The codegen treats the../as relative to the root rather than to the depth-1 file and reportsnot found: .schema_temp/../enums/channels.json. Pinned v3.0 schemas regen cleanly; v3.1.0-beta.2 does not. The hand-written Pydantic models in this PR are scoped to the new endpoint so we can ship the function without unblocking full regen first — that fix is tracked separately.Tests
Eight new wire-level tests under
TestFetchAgentAuthorizationsFromDirectory, all usinghttpx.MockTransport(noMagicMockof the client):test_happy_path_parses_into_pydantic— full body round-trips throughAgentAuthorizationsDirectoryResult.model_validate(), includes bothdirectandadagents_authoritativediscovery rows.test_404_returns_empty_publishers— explicit coverage of the 404 → empty contract.test_since_cursor_passes_through_as_query_string— verifies?since=reaches the wire.test_timeout_raises_adagents_timeout_error—httpx.ReadTimeout→AdagentsTimeoutError.test_malformed_json_raises_validation_error— non-JSON 200 body.test_schema_mismatch_raises_validation_error— missing required fields on the envelope and a publisher entry.test_non_https_directory_url_rejected— SSRF gate fires before any I/O.test_non_200_non_404_raises_validation_error— 5xx surfaces asAdagentsValidationError, not silent empty.Linked
Test plan
ruff check,mypy src/adcp/adagents.py, andpytest tests/test_adagents.pypass locally (verified)🤖 Generated with Claude Code