feat(site_analytics): app for collecting anonymized visit data for community pages#149
Open
bryangingechen wants to merge 17 commits intomasterfrom
Open
feat(site_analytics): app for collecting anonymized visit data for community pages#149bryangingechen wants to merge 17 commits intomasterfrom
bryangingechen wants to merge 17 commits intomasterfrom
Conversation
- Resolve site config open question: SITE_ANALYTICS_ALLOWED_SITES env var - Resolve auth open question: no per-site tokens in v1 - Document IP extraction strategy (X-Forwarded-For → REMOTE_ADDR fallback) - Clarify CSRF handling via DRF authentication_classes pattern - Specify backup_policy.py table placement for all three new tables - Split chunk plan: A1 includes backup_policy + AGENTS.md, A3/A4 split by daily/monthly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tings - New `site_analytics` Django app with models/, services/, tasks/, tests/ layout - AnalyticsPageView raw event model: site, path, referrer, user_agent, occurred_at, visitor_month_hash; indexes on (site, occurred_at) and (occurred_at); initial migration generated - Settings: SITE_ANALYTICS_HASH_SALT, SITE_ANALYTICS_ALLOWED_SITES, SITE_ANALYTICS_RETENTION_DAYS, and task period vars in base.py - backup_policy.py: site_analytics_analyticspageview → TRUNCATE_TABLES (raw rows contain visitor hashes; excluded from public backup) - repo_check_compose.sh: add step 12/12 for site_analytics test suite - AGENTS.md files created/updated for new app and root/qb_site indexes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests - POST /api/v1/analytics/collect: validate site/path, check allowlist, drop bots silently (204), insert AnalyticsPageView row, return 204 - services/hashing.py: get_client_ip (XFF → REMOTE_ADDR fallback) and compute_visitor_month_hash (pipe-separated fields, UA lowercased) - services/bot_filter.py: substring denylist for known bots/crawlers - .env.example: SITE_ANALYTICS_HASH_SALT, ALLOWED_SITES, RETENTION_DAYS - tests/test_services.py: IP extraction, hash determinism/isolation, bot detection - tests/test_collect_view.py: 204 success, 400 validation, bot drop, no raw IP in row, XFF hash isolation, field truncation, empty allowlist - design doc: A1+A2 progress notes, three implementation subtleties recorded Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AnalyticsDailyMetric model: site, date (UTC), pageviews, unique_visitors; unique constraint on (site, date); migration 0002 - aggregate_daily_metrics service: idempotent upsert over a rolling days_back window; preserves existing aggregates when raw rows have been pruned - site_analytics.aggregate_daily_metrics Celery task + beat schedule entry - backup_policy.py: site_analytics_analyticsdailymetric → RETAIN_TABLES - tests: basic count, idempotency, multi-site, UTC date boundary, prune-safe - design doc: A3 progress notes and three implementation subtleties recorded Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…schedules - AnalyticsMonthlyMetric model: site, month (UTC first-of-month DateField), pageviews, unique_visitors; unique constraint on (site, month); migration 0003 - aggregate_monthly_metrics service: idempotent upsert over rolling months_back window; preserves existing aggregates when raw rows have been pruned - prune_old_pageviews service: deletes AnalyticsPageView rows older than SITE_ANALYTICS_RETENTION_DAYS; aggregate tables are never pruned - site_analytics.aggregate_monthly_metrics and site_analytics.prune_old_pageviews Celery tasks + beat schedule entries for monthly aggregate and prune - backup_policy.py: site_analytics_analyticsmonthlymetric → RETAIN_TABLES - tests: monthly count, idempotency, month boundary, prune boundaries, settings default retention, prune-safe aggregate preservation - design doc: A4 progress notes; index name length limit subtlety noted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All three admin classes (AnalyticsPageView, AnalyticsDailyMetric, AnalyticsMonthlyMetric) are read-only: add/change/delete permissions disabled to enforce immutability of analytics data through the admin UI. PageView admin shows hash prefix and truncated referrer for readability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add SITE_ANALYTICS_REJECT_EMPTY_UA setting (default off) to optionally drop requests with no User-Agent header before the existing bot-filter check. Returns 204 (same as bot drop) to avoid leaking detection logic. Tests cover default-allow, flag-enabled drop, and flag-enabled accept. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nal ADR CORS: - AnalyticsCollectView now returns Access-Control-Allow-Origin: * on all responses and handles OPTIONS preflight so browsers on third-party static sites can call the endpoint directly without a server-side proxy - Two new tests: POST includes CORS header, OPTIONS preflight returns 204 ADR: - Convert 031 from living implementation plan to concise final decision record - Covers architecture, models, privacy invariants, operational notes, consequences, and deferred v1.1 items - Includes static-site tracking snippet (sendBeacon + fetch fallback) with onboarding instructions for new sites Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
write_dashboard crashed with KeyError when rendering fixtures that predate newer Dashboard enum members (e.g. NotFromFork). Use .get() with an empty-list default so old snapshots produce an empty table rather than aborting page generation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds --analytics-site / QUEUEBOARD_ANALYTICS_SITE support to dashboard.py: - Widens CSP connect-src when an analytics host is configured. - Injects a sendBeacon/fetch snippet before </body> on every generated page, including the static area_stats.html and dependency_dashboard.html. - Snippet is omitted entirely when the flag is absent, so existing deployments are unaffected. Updates docs/queueboard_main_workflow.md to pass QUEUEBOARD_ANALYTICS_SITE (new repo secret) alongside the existing QUEUEBOARD_API_BASE_URL in all three dashboard-generation steps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a prose overview of how the workflow operates, a table of the two required repo secrets (QUEUEBOARD_API_BASE_URL and the new QUEUEBOARD_ANALYTICS_SITE), and a note that omitting the analytics secret silently disables snippet injection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Injects a visible one-line notice ("no cookies, no IP addresses stored")
before </body> on every analytics-enabled page, styled via a new
.analytics-notice CSS rule. The notice is part of the same injection
block as the tracking script, so it appears whenever analytics is
active and is absent otherwise.
Updates the ADR (031) to:
- Add disclosure as an explicit onboarding step.
- Document the required notice wording and note that site adopters are
responsible for adding equivalent disclosure to their own pages.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This reverts commit 8de812a.
Injects a visible one-line notice ("no cookies, no IP addresses stored")
before </body> on every analytics-enabled page, styled via a new
.analytics-notice CSS rule. The notice appears whenever analytics is
active and is absent otherwise.
Adds a "Disclosure and privacy regulations" section to the ADR (031)
covering ePrivacy Art. 5(3), GDPR Recital 26 / Art. 13, CJEU Breyer
(C-582/14), EDPB Guidelines 2/2023, and CNIL consent-exempt analytics
guidance — all with verified source URLs. Documents the practical
position: no consent banner required, but a brief privacy notice is
recommended as good practice and to satisfy GDPR Art. 13 under the
cautious reading.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mock django.utils.timezone.now in tests that compare hardcoded fixture dates against the real clock. Without this, task tests flake near midnight boundaries and prune service tests rot as the hardcoded dates age past their retention windows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the fixed SITE_ANALYTICS_HASH_SALT env var with a randomly-generated per-month salt stored in a new SiteAnalyticsSalt model. A new Celery beat task (site_analytics.rotate_salt) runs at midnight UTC on the 1st of each month, creates a fresh salt, and deletes the previous row atomically, providing forward secrecy: past hashes cannot be re-derived even if the current salt leaks. Changes: - New SiteAnalyticsSalt model + migration 0004. - compute_visitor_hash() replaces compute_visitor_month_hash(): drops the explicit YYYY-MM argument; cross-month isolation is now provided by the rotating salt instead. - 60-second in-process salt cache with _reset_salt_cache() helper for test isolation; falls back to SITE_ANALYTICS_HASH_SALT until the first rotation task runs. - rotate_salt_task registered in beat schedule (monthly crontab). - SiteAnalyticsSalt added to TRUNCATE_TABLES in backup_policy.py (contains the live secret; excluded from sanitized backups). - Updated tests: ComputeVisitorHashTests, cache reset in collect-view setUp, new test_salt.py covering rotation task behaviour. - ADR and AGENTS.md updated to reflect new privacy model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This consists of a single API endpoint at
QUEUEBOARD_SITE/api/v1/analytics/collectwhich will be called by a small tracking script embedded on each page. No cookies will be stored on visitors' devices.We begin by embedding the tracking script in the queueboard frontend pages.
The data collected / aggregated:
AnalyticsPageView— raw event rows; immutable after insert; pruned afterSITE_ANALYTICS_RETENTION_DAYS(default 540).site,path,referrer,user_agent,occurred_at,visitor_month_hash.AnalyticsDailyMetric— daily aggregate per site; unique on(site, date).site,date(UTC),pageviews,unique_visitors.AnalyticsMonthlyMetric— monthly aggregate per site; unique on(site, month).site,month(UTC first-of-monthDateField, e.g.2026-03-01),pageviews,unique_visitors.Visitor IPs are not stored, only the hash of the IP + month, which makes it impossible to track users across months by design.
More analytics built on top of
AnalyticsPageViewevents may be added later.Prepared with Claude.