Skip to content

Sketch: admin rate-limits dashboard architecture (CF Access + Tunnel)#308

Draft
ptrlrd wants to merge 5 commits into
mainfrom
feat/admin-rate-limits-sketch
Draft

Sketch: admin rate-limits dashboard architecture (CF Access + Tunnel)#308
ptrlrd wants to merge 5 commits into
mainfrom
feat/admin-rate-limits-sketch

Conversation

@ptrlrd
Copy link
Copy Markdown
Owner

@ptrlrd ptrlrd commented May 20, 2026

Summary

Scope-locking sketch — no live behavior change. Establishes the architecture for runtime-tunable rate limits so the next PR can be focused "fill in TODOs."

What it looks like end-to-end

You (browser)
  │ OAuth (Google/GitHub via CF Access)
  ▼
Cloudflare edge — terminates TLS, checks Access policy
  │
  ▼
cloudflared (sidecar)   — outbound HTTPS to CF, NO published port
  │
  ▼
admin-dashboard         — tiny static UI
  │ X-Admin-Token header
  ▼
spire-codex-backend /api/admin/rate-limits/*
  │ (5s TTL cache)
  ▼
Mongo `rate_limits` collection — source of truth

Defense in depth

  • Layer 1 — CF Access OAuth at the edge. Only operators in the access policy can reach admin.spire-codex.com at all.
  • Layer 2 — X-Admin-Token in the backend. Survives single-layer misconfiguration of CF Access.
  • Layer 3 — No published port. admin-dashboard has zero port mappings; only path in is cloudflared's outbound connection to CF. Nothing for the public internet to scan.

What this PR contains

File Status
app/services/rate_limits_store.py Skeleton with 5s TTL cache, TODOs for Mongo I/O
app/routers/admin_rate_limits.py GET/PUT/DEL endpoints returning 501 + token check
docker-compose.admin.yml Two-container deploy: admin UI + cloudflared sidecar
playbooks/admin-install.yml Runbook header documenting one-time CF dashboard setup

Per-request cost

get_limit(slug, default) is the hot path — called from every @limiter.limit(...) evaluation. With the TTL cache: 1 Mongo hit per worker per 5s = ~12 reads/min across the fleet. Per-request overhead is a dict lookup and a monotonic-time compare.

Follow-up PR scope (separate)

  1. Implement _refresh_cache + Mongo collection
  2. Implement set_override / clear_override + validation via slowapi's limit-string parser
  3. Populate REGISTRY with the actual slugs in use today
  4. Flip one decorator (submit_run) from a hardcoded string to get_limit("submit_run", "3000/hour") as the smoke test
  5. Build the static admin UI (one HTML file is probably plenty for v1)
  6. Push CF Tunnel + Access setup steps as a one-time runbook task

Not deploying anything from this PR

main.py doesn't import the new router. The compose file isn't referenced by any playbook. The endpoints don't exist on prod after this merges. Pure architectural commit.

ptrlrd added 4 commits May 20, 2026 00:53
The 600/hour ceiling was sized for "a few hundred backlog runs on
first install" but silently caps users with larger histories — a
Discord report of a 1000+ run backlog would have dropped 400 at the
old cap. Backend work per submission is ~10ms (Mongo insert + JSON
file + metrics) and duplicate-hash dedup short-circuits in ~3ms, so
actual write load is bounded by *distinct* runs per uploader rather
than raw submission count. 3000/hr leaves headroom for legitimate
multi-thousand backlog uploads while still cutting off scraper
abuse.

Easy revert: bump back down if
`spire_codex_api_errors_total{path="/api/runs",status_code="429"}`
starts climbing against this endpoint specifically.
Scope-locking commit, not behavior-changing. Establishes the
architecture for a runtime-tunable rate-limit system without
implementing the bodies — so the next PR can be a focused
"fill in TODOs" rather than a 1000-line design+impl combo.

Pieces:
- `app/services/rate_limits_store.py`: Mongo-backed config with a 5s
  in-process TTL cache. Per-request lookup stays cheap; admin
  writes take effect within one cache tick. Mongo I/O stubbed out
  with explanatory TODOs.
- `app/routers/admin_rate_limits.py`: CRUD endpoints under
  `/api/admin/rate-limits`. Gated by `X-Admin-Token` header (defense
  in depth — CF Access is the outer layer). Endpoint bodies are
  501s with TODO markers.
- `docker-compose.admin.yml`: sketches a two-container deployment —
  a static admin UI behind cloudflared. NO published ports on the
  host; the only ingress is the Tunnel's outbound connection back
  to CF.
- `playbooks/admin-install.yml`: documents the one-time CF dashboard
  config (create Tunnel, add Access policy, generate ADMIN_TOKEN)
  as a runbook in the header. Tasks are placeholders until the
  image + tunnel actually exist.

Defense-in-depth: CF Access OAuth at the edge + X-Admin-Token at
the backend. Either alone is enough to block public traffic;
together they survive a single-layer misconfiguration.

No live endpoint behavior changes in this PR. Importing the new
router into main.py + flipping any decorator from a hardcoded
string to `get_limit(slug, default=...)` is the next PR.
Promotes the `X-Admin-Token` check from `admin_rate_limits.py` to
`dependencies.py::require_admin` since every admin router needs it.
Drops the inline `require_admin_token` calls from the rate-limits
router in favor of `dependencies=[Depends(require_admin)]` at
router-construction time — single declaration, gates every endpoint
on the router automatically.

New router skeletons (all 501 stubs, shape-locking only):

- `admin_moderation.py`  — soft-delete runs / guides / usernames.
  Hide pattern (set hidden_at field) instead of hard delete so undo
  is one update and audit references stay valid.
- `admin_ops.py`         — feature flags + CF cache purge + manual
  data refresh + maintenance banner. Wraps existing service-layer
  functions (`refresh_stats_summary`, news parser, etc.) plus the
  existing `playbooks/purge-cache.yml` CF API call.
- `admin_observability.py` — recent errors / rate-limit hits / search.
  Notes the writer-side work needed: a ring-buffer in
  RequestLoggingMiddleware for per-request error detail, and a small
  TTL'd Mongo collection for individual rate-limit events.
- `admin_bulk.py`        — long-running ops with a job-id pattern
  (rehash, dedupe, reattach files, import beta version, recompute
  scores). Each kicks off a background thread, status pollable via
  GET /jobs/{id}.
- `admin_audit.py`       — append-only read view over the audit_log
  collection. No delete endpoint, by design.
- `admin_api_keys.py`    — issue / list / rotate / revoke. Single-show
  semantics on plaintext (creation + rotation return it once,
  never persisted in plaintext). Prefix `sk_codex_` so leaks are
  scannable via GitHub secret-scanning + grep across logs.

New services (skeletons):

- `services/audit_log.py` — `record()` and `list_recent()`. Best-effort
  writes (Mongo down ≠ block the admin action).
- `services/api_keys_store.py` — sha256-hashed at rest (high-entropy
  keys don't need bcrypt), 30s in-process cache for the hot lookup
  path, soft-revoke, rotate.

Nothing wired into `main.py` yet — bodies all return 501, no router
is imported, no compose change. Same scope-lock pattern as the
rate-limits skeleton already in this PR: lock the shape, follow-up
PRs fill bodies one surface at a time.
Sibling to admin_moderation: that surface sets `hidden_at`, this
one exposes the population of hidden runs publicly. Mirrored Mongo
match clauses on the same field (`{hidden_at: None}` vs
`{hidden_at: {$ne: None}}`).

- `backend/app/routers/hall_of_shame.py` — public GET endpoint
  returning the standard leaderboard response shape + `hidden_at`
  and `hidden_reason` per entry. Same filter params as the regular
  leaderboard (category, players, game_mode, character) so users
  can ask "fastest hidden run on Multi / Custom" etc.
- `frontend/app/leaderboards/hall-of-shame/page.tsx` — public route
  at /leaderboards/hall-of-shame. Sketch only — real table lands
  with the moderation pipeline. Includes the editorial policy
  inline so visitors see the curation rules without a separate FAQ.

Editorial guard rails baked into the docstrings:
- Strictly admin-curated; no auto-flagging onto this page
- `hidden_reason` is required + visible in every row, so vague
  judgment calls ("looks suspicious") are caught at moderation
  time, not after they're public
- `robots: noindex` on the page so flagged usernames don't get
  indexed by search engines

Pipe is end-to-end stubbed: moderation route sets the field,
hall-of-shame route reads it, frontend renders it. Bodies land in
the moderation follow-up PR.
…, query console

Five more skeletons rounding out the "every routine operator question
has a one-click answer" surface:

- `services/umami_client.py` + `routers/admin_umami.py` — pulls
  Umami's REST API into the admin dashboard. Backend holds the
  Umami admin creds so the dashboard never sees them; we proxy
  active/summary/top-pages/referrers/countries/browsers with a 5-60s
  TTL cache per endpoint. Single pane of glass instead of bouncing
  between two UIs for the 80%-case glance.

- `routers/admin_integrations.py` — outbound integration health +
  one-click test fire for Discord webhooks, Resend, Sentry, GitHub
  App, Cloudflare API, IndexNow. Catches credential rotation breaks
  before users do — currently the half-life on "token expired, no
  one noticed" is weeks.

- `routers/admin_schedules.py` — unified view of GH Actions cron
  workflows (news-refresh, runs-db-backup) + backend in-process
  daemons (stats_summary refresher, run-entity-stats warmer).
  Shows last-run-at + result for each. Makes silent cron failure
  audible.

- `routers/admin_query.py` — locked-down Mongo read-only query
  console. Whitelists collections + ops (find / find_one /
  count_documents / aggregate / distinct), forbids dangerous
  pipeline stages ($out, $merge, $where, $function), 100-doc
  result cap, 5s maxTimeMS, every query logged to admin_audit.
  Lets operators answer one-offs without shelling into mongosh.

Refined the `admin_ops.py` banner docstring to explicitly cover
announcements (patch news with level=info), maintenance warnings
(level=warn), AND incidents (level=error) on the same endpoint /
data model. `expires_at` enforces a self-vacating window so an
announcement doesn't outlive its relevance if the operator forgets
to clear it.

Same scope-lock pattern: every body is 501, no router imported
into main.py, no live behavior change. Follow-up PRs fill in one
surface at a time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant