feat: expose Prometheus /metrics endpoint for usage dashboards#102
Conversation
Adds a Prometheus text-format `GET /metrics` endpoint covering the metrics requested in #101: * cryptify_uploads_total{channel} * cryptify_upload_bytes_total{channel} * cryptify_storage_bytes (gauge, sampled periodically from data_dir) * cryptify_active_files (gauge, same source) * cryptify_expired_files_total (counter, purged-before-finalized) The channel label is derived from request headers: 1. X-Cryptify-Source explicit header 2. Authorization: Bearer / X-Api-Key -> "api" 3. Origin -> "staging-website" / "website" 4. User-Agent substring -> "outlook" / "thunderbird" 5. fallback "unknown" Values are sanitized (lower-case [a-z0-9_-], max 32 chars) to prevent label-injection and cardinality blowup. Storage gauges are sampled by a background task that walks data_dir every `metrics_scan_interval_secs` (default 60, configurable). Dashboard JSON ready for import into the Scaleway Grafana instance is shipped under `docs/grafana/`, alongside a Prometheus scrape-config example. No authentication on /metrics; restrict via firewall / proxy allow-list (documented in README and docs/grafana/README.md). Refs #101
|
/dobby fix conflicts |
|
Dobby has seen the request! Dobby will fix the conflicts right away, sir! Dobby is very good at resolving merge conflicts — Dobby will have this PR clean and ready in no time! 🧹 |
Integrate main-branch additions (chunk_size config, per-upload/rolling rate limits, API key detection, /usage endpoint) alongside the metrics feature from this branch. Both feature sets are preserved and work together. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Conflicts resolved and pushed. Here's what happened: Conflicting files: Root cause: Main merged chunk-size limits, per-upload/rolling rate limiting, API key detection, and a Resolution: Kept both feature sets — all metrics fields/logic alongside the new rate-limiting, chunk-size, and API key features. Specifically:
Verification:
|
Rule compliance pass: - cargo fmt --all (rust-run-cargo-fmt-before-push) - README "Monitoring" section removed; docs already live in docs/grafana/README.md (standardized-readmes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Rule compliance sweep applied three fixes in ea1357d / PR body edit:
No other rule violations found across the 60+ rule files and the cryptify-specific notes. |
There was a problem hiding this comment.
Looked good overall — code is clean, the 13 metrics tests cover sanitization / header priority / rendering / sampling, store integration handles the unfinalized-expired case sensibly via try_lock, and cargo test is 43/43 green with no new clippy warnings.
One blocker before merge:
docs/grafana/README.md — broken YAML in the scrape config example (around line 34). The second target's labels: block is indented two extra spaces, so it nests under targets: instead of being a sibling. Anyone copy-pasting this into Prometheus will get a parse error.
Current (broken):
- targets: ['cryptify-staging.postguard.eu:8000']
labels:
environment: staging
- targets: ['cryptify.postguard.eu:8000']
labels:
environment: productionShould be (matching the first entry):
- targets: ['cryptify.postguard.eu:8000']
labels:
environment: productionNits (non-blocking, ignore if you disagree):
metrics_scan_interval_secs = 0in config would busy-loop the sampler. Cheap to clamp to a minimum (e.g. 1s) inFrom<RawCryptifyConfig>, but very low risk.upload_bytes_totalis recorded at finalize, so abandoned uploads aren't counted in bytes — already noted as a follow-up in the PR body, fine to leave.
Fix the YAML and this is good to go.
The production target's labels block was indented under targets instead of being a sibling, which would fail YAML parsing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Fixed the YAML indentation in |
There was a problem hiding this comment.
Verdict: looks ready to merge (submitting as comment since the bot can't formally approve its own PR).
Re-reviewed after the YAML fix in eca3012.
Verified:
docs/grafana/README.mdscrape-config YAML now has the productionlabels:block as a sibling oftargets:— matches the staging entry. Previous blocker resolved.cargo test— 43/43 green.cargo fmt --all -- --check— clean.cargo clippy --all-targets— only the pre-existinguseless_formatwarning atsrc/email.rs:253; no new warnings.- Metrics module: label sanitization caps at 32 chars and restricts to
[a-z0-9_-](prevents Prometheus label injection / cardinality blow-up), channel-detection priority matches the PR description, and the store'stry_lock-then-record-expired path attributes purged-unfinalized uploads correctly. - Conflict resolution from the previous round (merging metrics with PR #100's rate-limit / api-key /
/usagework) is consistent —FileStatecarries bothsource_channelandis_api_key,upload_finalizerecords both metrics and usage, both routes are mounted.
Non-blocking nit (carried forward): metrics_scan_interval_secs = 0 would busy-loop the storage sampler. Trivial clamp in From<RawCryptifyConfig>, very low risk — fine to leave.
Heads-up: GitHub reports mergeable_state: dirty again — main has advanced since the prior conflict resolution (lettre bump, pg-core 0.6, upload_chunk fixes, integration test harness). One more rebase needed before merge, but no code changes required.
There was a problem hiding this comment.
Re-review cycle 0/3 — verdict: ready to merge (submitting as comment; the bot can't formally approve its own PR).
Verified at HEAD eca3012:
cargo test— 43/43 green (13 metrics + 5 store/usage + 25 existing).cargo clippy --all-targets— no new warnings; only the pre-existinguseless_formatatsrc/email.rs:253.cargo fmt --check— clean.docs/grafana/README.mdscrape-config YAML is well-formed (prior blocker stayed fixed).src/metrics.rslabel sanitization (lowercase,[a-z0-9_-], cap 32) blocks Prometheus label injection and cardinality blow-up.detect_channelpriority order matches the PR description and the OpenAPI/README docs.- Store integration: the
try_lock+sender.is_none()check inSharedState::cleanupcorrectly attributes purged-unfinalized uploads tocryptify_expired_files_totalwithout false positives on finalized rows. /metricsendpoint is wired intoroutes![]and the storage sampler is spawned with the newmetrics_scan_interval_secsconfig.
Non-blocking nit (carried over, optional): metrics_scan_interval_secs = 0 would busy-loop the sampler. A .max(1) clamp in From<RawCryptifyConfig> would close it, but the risk is very low (operator misconfig only).
Heads-up — not a code issue: mergeable_state: dirty against main. A rebase is needed before merge but no code changes required.
|
/dobby fix the conflicts. |
|
On it, @rubenhensen — this agent is checking the rebase state and will replay the conflict resolution against current main. |
Resolves conflicts between the metrics work and main's recent additions (api_key_tenant tenant tracking, recovery_token / upload_status, idle TTL, download handler). Keeps both feature sets: - FileState carries both source_channel (metrics) and the new api_key_tenant/api_key_validation_failed/last_chunk/recovery_token fields from main; is_api_key is replaced by api_key_tenant.is_some(). - Store::with_idle_ttl now takes (idle_ttl, metrics); Store::new(metrics) is the test-only shortcut. - Purge counts unfinalized expirations via metrics.record_expired(). - build_rocket spawns storage_sampler, mounts metrics_endpoint, and manages the Metrics arc alongside pkg_client and the per-tenant store. - upload_finalize records both per-channel metrics and per-tenant usage. cargo check --all-targets, cargo clippy --all-targets, cargo test (100/100), cargo fmt --check all clean.
|
Conflicts resolved in 86e3cf1 (merge of main into the branch). Conflicting files: Root cause: main grew the API-key tenant model ( Resolution: kept both feature sets, taking main's structure as the base and re-layering metrics on top:
Verification:
|
|
I don't like the added docs and the added grafana board. I just need the metrics exposed. |
|
Dobby hears master Ruben! Dobby will remove the docs and the Grafana board, keeping only the /metrics endpoint exposed. Dobby is on it! |
Remove the Grafana dashboard JSON and scrape-config README per maintainer feedback — keep only the /metrics endpoint exposure.
|
Dropped |
…rd (#121) Adds X-Cryptify-Source: thunderbird to PG_CLIENT_HEADER so cryptify's per-channel upload metrics (encryption4all/cryptify#102) classify this add-in deterministically rather than via the User-Agent substring fallback. Thunderbird WebExtensions don't always present a stable Origin (can be `moz-extension://<uuid>` or null), and the User-Agent fallback is the last layer of cryptify's detect_channel. Setting the explicit header at the source removes any environment-dependent ambiguity and is symmetric with parallel PRs in postguard-website and postguard-outlook-addon.
) Adds the X-Cryptify-Source: outlook header to clientHeaders() so cryptify's per-channel upload metrics (encryption4all/cryptify#102) classify this add-in deterministically. cryptify's detect_channel checks the Origin header before falling back to User-Agent substring matching. The add-in is served from addin.*.postguard.eu, which matches cryptify's `contains("postguard.")` rule and would otherwise shadow the User-Agent "outlook" check — labeling Outlook uploads as `website` / `staging-website` instead of `outlook`. Setting the explicit header here removes the ambiguity.
…228) Sets X-Cryptify-Source on the PostGuard SDK so cryptify (≥0.1.27+, encryption4all/cryptify#102) classifies uploads from this site deterministically rather than via the Origin-host fallback. Origin-based detection collided with the Outlook add-in, which is served from addin.*.postguard.eu — that host matches cryptify's `contains("postguard.")` rule and shadows the User-Agent "outlook" check. With the explicit header, the website always reports `channel="website"` regardless of the deploy host.
Summary
Server-side half of #101: expose Prometheus text-format
GET /metricsfor Grafana scraping.Metrics:
cryptify_uploads_total{channel}— counter, finalized uploadscryptify_upload_bytes_total{channel}— counter, bytes uploadedcryptify_storage_bytes— gauge,data_dirdisk usagecryptify_active_files— gauge,data_dirfile countcryptify_expired_files_total— counter, uploads purged before finalizechannelis derived fromX-Cryptify-Source→ bearer/api-key →Origin→User-Agent→unknown. Sanitized to[a-z0-9_-], max 32 chars.Storage gauges are sampled from
data_diron a background task (metrics_scan_interval_secs, default 60s) — off the upload hot path.Why draft
/metricsis unauthenticated; restrict at the firewall / reverse proxy. Confirm this fits the Scaleway / Procolix network policy before merging.Test plan
cargo check,cargo test,cargo clippy --all-targets,cargo fmt --check/metricsfrom staging, confirm counters moveRefs #101
git fetch origin && git checkout feat/prometheus-metrics-endpoint && cargo test