Skip to content

feat(server): TenantRegistry parity with JS createTenantRegistry #619

@bokelley

Description

@bokelley

Problem

The Python SDK ships CallableSubdomainTenantRouter (host-routing callback, PR #544) and LazyPlatformRouter (per-tenant platform factory, PR #547). The JS SDK ships createTenantRegistry which is a higher-level primitive built on top of the same building blocks. Adopters running multi-tenant Python deployments (we're the most advanced one) end up reinventing the registry layer.

What JS has that Python doesn't

From @adcp/sdk/server's createTenantRegistry (see adcp-client/skills/build-decisioning-platform/advanced/MULTI-TENANT.md):

  1. Per-tenant health statespending (registered, not yet validated, refused with 503), healthy (serving), unverified (was healthy, transient validation failure, graceful-degrade), disabled (permanent failure, refused until admin recheck()). Per-tenant — one bad tenant doesn't block others.
  2. Runtime register(tenantId, config) / unregister(tenantId) — add/remove tenants without restarting the process. Admin webhook saves a new tenant row, calls register, the next request to that host resolves it. We need this; today we plumb it ourselves through the admin flow.
  3. recheck(tenantId) — re-validate a tenant after key rotation or config change. Status transitions disabled → healthy without a traffic gap.
  4. awaitFirstValidation: true — boot-time semantic where register() doesn't return until the tenant has been validated, so the first request after register doesn't race the validation roundtrip.
  5. resolveByHost(host) — synchronous lookup returning the registered server (or null). Composes naturally with serve().

JS-specific bits we don't need: JWKS validation (we use principal-token bearer auth, not JWT). The registry should be JWKS-agnostic — adopters that want JWKS can pass a validator, adopters that don't pass nothing.

What we have today

core/main.py builds:

  • A CallableSubdomainTenantRouter with a 60-second TTL cache (host → Tenant lookup against our DB)
  • A LazyPlatformRouter with per-tenant DecisioningPlatform factory
  • An ad-hoc admin-flow invalidate(host) call when a tenant is created / deactivated / has its subdomain rotated

What's missing relative to JS:

  • No health states. A misconfigured tenant (e.g. GAM credentials missing) only fails on first request, with a generic 500 — there's no "this tenant is disabled" classifier the LB or admin UI can observe.
  • No runtime register without invalidate-the-cache plumbing.
  • No recheck() — config changes require either a process restart or manual cache invalidation.
  • No first-validation boot semantic — the first request to a freshly-registered tenant pays the platform-build cost.

Proposed SDK shape

from adcp.server import TenantRegistry, BearerTokenAuth, serve

registry = TenantRegistry(
    default_serve_options={
        \"name\": \"my-multi-tenant-host\",
        \"validation\": {\"requests\": \"strict\", \"responses\": \"strict\"},
    },
    # Optional validator — adopters using JWT can pass a JWKS validator;
    # principal-token adopters pass None.
    validator=None,
    auto_validate=True,
)

# Register at boot
for tenant in load_tenants_from_db():
    await registry.register(
        tenant.id,
        agent_url=tenant.agent_url,
        platform=build_platform_for(tenant),
        await_first_validation=True,
    )

# Resolve per request
def resolve(ctx) -> AdcpServer:
    resolved = registry.resolve_by_host(ctx.host)
    if resolved is None or resolved.health == \"disabled\":
        raise HTTPException(503)
    return resolved.server

serve(resolve, auth=BearerTokenAuth(validate_token=...), port=os.environ[\"PORT\"])

# Runtime admin operations
await registry.register(tenant_id, agent_url=..., platform=...)  # add
registry.unregister(tenant_id)                                    # remove
await registry.recheck(tenant_id)                                 # re-validate
status = registry.health(tenant_id)                               # observe

The internals reuse CallableSubdomainTenantRouter and LazyPlatformRouterTenantRegistry is the higher-level primitive that composes them with health tracking and runtime mutation, matching the JS surface area.

Why this matters

  • Multi-tenant SaaS deployments are the common shape for AdCP sellers. Every Python adopter at our scale will land here.
  • Building it ourselves means each adopter has slightly-different health semantics, slightly-different admin webhook plumbing, slightly-different cache-invalidation rules. A canonical primitive collapses that variation.
  • Closes the JS↔Python parity gap on the most-touched server-side primitive.

Files

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions