Skip to content

Capture search query text in searches.db for top + zero-result analytics#272

Open
ptrlrd wants to merge 1 commit into
mainfrom
feat/search-analytics
Open

Capture search query text in searches.db for top + zero-result analytics#272
ptrlrd wants to merge 1 commit into
mainfrom
feat/search-analytics

Conversation

@ptrlrd
Copy link
Copy Markdown
Owner

@ptrlrd ptrlrd commented May 15, 2026

Summary

Adds persistent search-query logging so we can see what people search for (not just the count by entity type that Prometheus tracks today). New searches.db lives next to runs.db inside DATA_DIR, so the existing Litestream pipeline covers it for free.

What ships

  • services/searches_db.py — schema (searches table + 3 indexes), background writer thread with a 10k-row bounded queue (drops on overflow, never blocks the request), helpers for top_searches / recent_searches / search_volume. IP is hashed with a daily UTC salt so the same client de-dupes within a day but isn't trackable across days.
  • Middleware hook in main.py — the existing ?search= detection branch now also enqueues (query, entity_type, lang, ip_hash) alongside the existing Prometheus counter increment.
  • routers/admin_searches.py — gated by X-Admin-Token header (sourced from ADMIN_TOKEN env). Three endpoints:
    • GET /api/admin/searches/top?days=7&limit=100 — most-searched queries
    • GET /api/admin/searches/recent?limit=200 — newest first, spot-check live traffic
    • GET /api/admin/searches/volume?days=30 — per-day rollup for trends
  • Compose filesADMIN_TOKEN plumbed through both prod + beta.

Why a separate DB

Writes are ~2/sec sustained, way higher than runs.db write rate. A separate file isolates the workloads so search write bursts don't compete with run submission/leaderboard reads, and the table can be wiped/retention-trimmed independently without touching run data.

After merge

  1. Set ADMIN_TOKEN=<random> in .env on the server (anything missing/empty returns 503 from the admin endpoints — explicit "not configured" instead of silent failures).
  2. Once deployed, query:
    curl -H "X-Admin-Token: $ADMIN_TOKEN" https://spire-codex.com/api/admin/searches/top?days=7
    
  3. The zero-result view requires the search endpoints to know whether they returned 0 hits — not in this PR. Top + recent + volume work immediately on any traffic that hits a list endpoint with ?search=.

@ptrlrd ptrlrd force-pushed the feat/search-analytics branch from 7718a60 to ca1cbd6 Compare May 18, 2026 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant