Capture search query text in searches.db for top + zero-result analytics#272
Open
ptrlrd wants to merge 1 commit into
Open
Capture search query text in searches.db for top + zero-result analytics#272ptrlrd wants to merge 1 commit into
ptrlrd wants to merge 1 commit into
Conversation
7718a60 to
ca1cbd6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds persistent search-query logging so we can see what people search for (not just the count by entity type that Prometheus tracks today). New
searches.dblives next toruns.dbinsideDATA_DIR, so the existing Litestream pipeline covers it for free.What ships
services/searches_db.py— schema (searchestable + 3 indexes), background writer thread with a 10k-row bounded queue (drops on overflow, never blocks the request), helpers fortop_searches/recent_searches/search_volume. IP is hashed with a daily UTC salt so the same client de-dupes within a day but isn't trackable across days.main.py— the existing?search=detection branch now also enqueues(query, entity_type, lang, ip_hash)alongside the existing Prometheus counter increment.routers/admin_searches.py— gated byX-Admin-Tokenheader (sourced fromADMIN_TOKENenv). Three endpoints:GET /api/admin/searches/top?days=7&limit=100— most-searched queriesGET /api/admin/searches/recent?limit=200— newest first, spot-check live trafficGET /api/admin/searches/volume?days=30— per-day rollup for trendsADMIN_TOKENplumbed through both prod + beta.Why a separate DB
Writes are ~2/sec sustained, way higher than
runs.dbwrite rate. A separate file isolates the workloads so search write bursts don't compete with run submission/leaderboard reads, and the table can be wiped/retention-trimmed independently without touching run data.After merge
ADMIN_TOKEN=<random>in.envon the server (anything missing/empty returns 503 from the admin endpoints — explicit "not configured" instead of silent failures).?search=.