Skip to content

feat: expose Prometheus /metrics endpoint for usage dashboards#102

Merged
rubenhensen merged 6 commits into
mainfrom
feat/prometheus-metrics-endpoint
May 18, 2026
Merged

feat: expose Prometheus /metrics endpoint for usage dashboards#102
rubenhensen merged 6 commits into
mainfrom
feat/prometheus-metrics-endpoint

Conversation

@dobby-coder
Copy link
Copy Markdown
Contributor

@dobby-coder dobby-coder Bot commented Apr 21, 2026

Summary

Server-side half of #101: expose Prometheus text-format GET /metrics for Grafana scraping.

Metrics:

  • cryptify_uploads_total{channel} — counter, finalized uploads
  • cryptify_upload_bytes_total{channel} — counter, bytes uploaded
  • cryptify_storage_bytes — gauge, data_dir disk usage
  • cryptify_active_files — gauge, data_dir file count
  • cryptify_expired_files_total — counter, uploads purged before finalize

channel is derived from X-Cryptify-Source → bearer/api-key → OriginUser-Agentunknown. Sanitized to [a-z0-9_-], max 32 chars.

Storage gauges are sampled from data_dir on a background task (metrics_scan_interval_secs, default 60s) — off the upload hot path.

Why draft

/metrics is unauthenticated; restrict at the firewall / reverse proxy. Confirm this fits the Scaleway / Procolix network policy before merging.

Test plan

  • cargo check, cargo test, cargo clippy --all-targets, cargo fmt --check
  • Scrape /metrics from staging, confirm counters move

Refs #101

git fetch origin && git checkout feat/prometheus-metrics-endpoint && cargo test

Adds a Prometheus text-format `GET /metrics` endpoint covering the metrics
requested in #101:

  * cryptify_uploads_total{channel}
  * cryptify_upload_bytes_total{channel}
  * cryptify_storage_bytes (gauge, sampled periodically from data_dir)
  * cryptify_active_files (gauge, same source)
  * cryptify_expired_files_total (counter, purged-before-finalized)

The channel label is derived from request headers:
  1. X-Cryptify-Source explicit header
  2. Authorization: Bearer / X-Api-Key -> "api"
  3. Origin -> "staging-website" / "website"
  4. User-Agent substring -> "outlook" / "thunderbird"
  5. fallback "unknown"

Values are sanitized (lower-case [a-z0-9_-], max 32 chars) to prevent
label-injection and cardinality blowup.

Storage gauges are sampled by a background task that walks data_dir every
`metrics_scan_interval_secs` (default 60, configurable). Dashboard JSON
ready for import into the Scaleway Grafana instance is shipped under
`docs/grafana/`, alongside a Prometheus scrape-config example.

No authentication on /metrics; restrict via firewall / proxy allow-list
(documented in README and docs/grafana/README.md).

Refs #101
@rubenhensen
Copy link
Copy Markdown
Contributor

/dobby fix conflicts

@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented Apr 25, 2026

Dobby has seen the request! Dobby will fix the conflicts right away, sir! Dobby is very good at resolving merge conflicts — Dobby will have this PR clean and ready in no time! 🧹

Integrate main-branch additions (chunk_size config, per-upload/rolling
rate limits, API key detection, /usage endpoint) alongside the metrics
feature from this branch. Both feature sets are preserved and work
together.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented Apr 25, 2026

Conflicts resolved and pushed. Here's what happened:

Conflicting files: src/config.rs, src/main.rs, src/store.rs

Root cause: Main merged chunk-size limits, per-upload/rolling rate limiting, API key detection, and a /usage endpoint (from PR #100) while this branch added Prometheus metrics. Both sides touched the same structs and functions.

Resolution: Kept both feature sets — all metrics fields/logic alongside the new rate-limiting, chunk-size, and API key features. Specifically:

  • config.rs: Added both metrics_scan_interval_secs and chunk_size fields
  • store.rs: FileState now has both source_channel (metrics) and is_api_key (rate limits); kept UploadRecord and usage tracking from main
  • main.rs: Merged imports, kept both /metrics endpoint and ApiKeyPresent guard, upload_init stores both source_channel and is_api_key, upload_finalize records both metrics and usage, routes include both metrics_endpoint and usage

Verification:

  • cargo check — clean
  • cargo test — all 43 tests pass (13 metrics + 5 store/usage + 25 existing)
  • cargo clippy — no new warnings (one pre-existing in email.rs:253)

Rule compliance pass:
- cargo fmt --all (rust-run-cargo-fmt-before-push)
- README "Monitoring" section removed; docs already live in
  docs/grafana/README.md (standardized-readmes)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented May 17, 2026

Rule compliance sweep applied three fixes in ea1357d / PR body edit:

  • rust-run-cargo-fmt-before-push — ran cargo fmt --all; pre-existing drift in src/email.rs, src/error.rs, src/main.rs plus formatting in the new metrics.rs is now clean. cargo test still 43/43 green.
  • standardized-readmes — removed the ## Monitoring section from README.md. The standard layout is logo / docs link / explanation / Development / Releasing / License only; the content already lives in docs/grafana/README.md.
  • pr-close-issue-keywords — PR body switched from Refs #101 to Closes #101 so merging auto-closes the tracking issue.

No other rule violations found across the 60+ rule files and the cryptify-specific notes.

Copy link
Copy Markdown
Contributor Author

@dobby-coder dobby-coder Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked good overall — code is clean, the 13 metrics tests cover sanitization / header priority / rendering / sampling, store integration handles the unfinalized-expired case sensibly via try_lock, and cargo test is 43/43 green with no new clippy warnings.

One blocker before merge:

docs/grafana/README.md — broken YAML in the scrape config example (around line 34). The second target's labels: block is indented two extra spaces, so it nests under targets: instead of being a sibling. Anyone copy-pasting this into Prometheus will get a parse error.

Current (broken):

      - targets: ['cryptify-staging.postguard.eu:8000']
        labels:
          environment: staging
      - targets: ['cryptify.postguard.eu:8000']
          labels:
            environment: production

Should be (matching the first entry):

      - targets: ['cryptify.postguard.eu:8000']
        labels:
          environment: production

Nits (non-blocking, ignore if you disagree):

  • metrics_scan_interval_secs = 0 in config would busy-loop the sampler. Cheap to clamp to a minimum (e.g. 1s) in From<RawCryptifyConfig>, but very low risk.
  • upload_bytes_total is recorded at finalize, so abandoned uploads aren't counted in bytes — already noted as a follow-up in the PR body, fine to leave.

Fix the YAML and this is good to go.

The production target's labels block was indented under targets instead
of being a sibling, which would fail YAML parsing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented May 17, 2026

Fixed the YAML indentation in docs/grafana/README.md in eca3012 — the production target's labels: block is now a sibling of targets: matching the staging entry. cargo test 43/43 green, cargo fmt --check clean.

Copy link
Copy Markdown
Contributor Author

@dobby-coder dobby-coder Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: looks ready to merge (submitting as comment since the bot can't formally approve its own PR).

Re-reviewed after the YAML fix in eca3012.

Verified:

  • docs/grafana/README.md scrape-config YAML now has the production labels: block as a sibling of targets: — matches the staging entry. Previous blocker resolved.
  • cargo test — 43/43 green.
  • cargo fmt --all -- --check — clean.
  • cargo clippy --all-targets — only the pre-existing useless_format warning at src/email.rs:253; no new warnings.
  • Metrics module: label sanitization caps at 32 chars and restricts to [a-z0-9_-] (prevents Prometheus label injection / cardinality blow-up), channel-detection priority matches the PR description, and the store's try_lock-then-record-expired path attributes purged-unfinalized uploads correctly.
  • Conflict resolution from the previous round (merging metrics with PR #100's rate-limit / api-key / /usage work) is consistent — FileState carries both source_channel and is_api_key, upload_finalize records both metrics and usage, both routes are mounted.

Non-blocking nit (carried forward): metrics_scan_interval_secs = 0 would busy-loop the storage sampler. Trivial clamp in From<RawCryptifyConfig>, very low risk — fine to leave.

Heads-up: GitHub reports mergeable_state: dirty again — main has advanced since the prior conflict resolution (lettre bump, pg-core 0.6, upload_chunk fixes, integration test harness). One more rebase needed before merge, but no code changes required.

Copy link
Copy Markdown
Contributor Author

@dobby-coder dobby-coder Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review cycle 0/3 — verdict: ready to merge (submitting as comment; the bot can't formally approve its own PR).

Verified at HEAD eca3012:

  • cargo test — 43/43 green (13 metrics + 5 store/usage + 25 existing).
  • cargo clippy --all-targets — no new warnings; only the pre-existing useless_format at src/email.rs:253.
  • cargo fmt --check — clean.
  • docs/grafana/README.md scrape-config YAML is well-formed (prior blocker stayed fixed).
  • src/metrics.rs label sanitization (lowercase, [a-z0-9_-], cap 32) blocks Prometheus label injection and cardinality blow-up.
  • detect_channel priority order matches the PR description and the OpenAPI/README docs.
  • Store integration: the try_lock + sender.is_none() check in SharedState::cleanup correctly attributes purged-unfinalized uploads to cryptify_expired_files_total without false positives on finalized rows.
  • /metrics endpoint is wired into routes![] and the storage sampler is spawned with the new metrics_scan_interval_secs config.

Non-blocking nit (carried over, optional): metrics_scan_interval_secs = 0 would busy-loop the sampler. A .max(1) clamp in From<RawCryptifyConfig> would close it, but the risk is very low (operator misconfig only).

Heads-up — not a code issue: mergeable_state: dirty against main. A rebase is needed before merge but no code changes required.

@rubenhensen
Copy link
Copy Markdown
Contributor

/dobby fix the conflicts.

@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented May 18, 2026

On it, @rubenhensen — this agent is checking the rebase state and will replay the conflict resolution against current main.

Resolves conflicts between the metrics work and main's recent additions
(api_key_tenant tenant tracking, recovery_token / upload_status, idle
TTL, download handler). Keeps both feature sets:

- FileState carries both source_channel (metrics) and the new
  api_key_tenant/api_key_validation_failed/last_chunk/recovery_token
  fields from main; is_api_key is replaced by api_key_tenant.is_some().
- Store::with_idle_ttl now takes (idle_ttl, metrics); Store::new(metrics)
  is the test-only shortcut.
- Purge counts unfinalized expirations via metrics.record_expired().
- build_rocket spawns storage_sampler, mounts metrics_endpoint, and
  manages the Metrics arc alongside pkg_client and the per-tenant store.
- upload_finalize records both per-channel metrics and per-tenant usage.

cargo check --all-targets, cargo clippy --all-targets, cargo test
(100/100), cargo fmt --check all clean.
@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented May 18, 2026

Conflicts resolved in 86e3cf1 (merge of main into the branch).

Conflicting files: src/error.rs, src/main.rs, src/store.rs

Root cause: main grew the API-key tenant model (api_key_tenant: Option<String>, api_key_validation_failed), a per-session recovery_token / upload_status flow, configurable idle TTL, and a download handler — all touching the same structs and build_rocket plumbing as the metrics work on this branch.

Resolution: kept both feature sets, taking main's structure as the base and re-layering metrics on top:

  • FileState carries both source_channel (metrics) and main's new api_key_tenant / api_key_validation_failed / last_chunk / recovery_token. The old boolean is_api_key is gone — state.api_key_tenant.is_some() is the new test (richer, validated against pg-pkg).
  • Store::with_idle_ttl(idle_ttl, metrics) is the production constructor; the test-only Store::new(metrics) uses the default idle window.
  • SharedState holds both idle_ttl and metrics. Purge counts unfinalized evictions via metrics.record_expired() before clearing the expiration_keys entry.
  • build_rocket spawns storage_sampler and mounts metrics_endpoint alongside upload_status and download; manages the Metrics arc next to Store, vk, pkg_client.
  • upload_finalize records both per-channel metrics (metrics.record_upload) and per-tenant usage (store.record_upload(accounting_key, …)).
  • upload_init populates source_channel from the existing ClientHeaders extractor.

Verification:

  • cargo check --all-targets — clean
  • cargo clippy --all-targets — no warnings
  • cargo test — 100 passed / 0 failed
  • cargo fmt --all -- --check — clean

@rubenhensen
Copy link
Copy Markdown
Contributor

I don't like the added docs and the added grafana board. I just need the metrics exposed.

@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented May 18, 2026

Dobby hears master Ruben! Dobby will remove the docs and the Grafana board, keeping only the /metrics endpoint exposed. Dobby is on it!

Remove the Grafana dashboard JSON and scrape-config README per
maintainer feedback — keep only the /metrics endpoint exposure.
@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented May 18, 2026

Dropped docs/grafana/ (README + dashboard JSON) in 627033c — PR body trimmed to match. The PR now only ships the /metrics endpoint and the supporting code. cargo test still green (100 passed).

@rubenhensen rubenhensen marked this pull request as ready for review May 18, 2026 09:56
@rubenhensen rubenhensen merged commit ffeedb6 into main May 18, 2026
7 checks passed
@rubenhensen rubenhensen deleted the feat/prometheus-metrics-endpoint branch May 18, 2026 09:56
rubenhensen added a commit to encryption4all/postguard-tb-addon that referenced this pull request May 18, 2026
…rd (#121)

Adds X-Cryptify-Source: thunderbird to PG_CLIENT_HEADER so cryptify's
per-channel upload metrics (encryption4all/cryptify#102) classify this
add-in deterministically rather than via the User-Agent substring
fallback.

Thunderbird WebExtensions don't always present a stable Origin
(can be `moz-extension://<uuid>` or null), and the User-Agent fallback
is the last layer of cryptify's detect_channel. Setting the explicit
header at the source removes any environment-dependent ambiguity and is
symmetric with parallel PRs in postguard-website and
postguard-outlook-addon.
rubenhensen added a commit to encryption4all/postguard-outlook-addon that referenced this pull request May 18, 2026
)

Adds the X-Cryptify-Source: outlook header to clientHeaders() so
cryptify's per-channel upload metrics
(encryption4all/cryptify#102) classify this add-in deterministically.

cryptify's detect_channel checks the Origin header before falling back
to User-Agent substring matching. The add-in is served from
addin.*.postguard.eu, which matches cryptify's `contains("postguard.")`
rule and would otherwise shadow the User-Agent "outlook" check —
labeling Outlook uploads as `website` / `staging-website` instead of
`outlook`. Setting the explicit header here removes the ambiguity.
rubenhensen added a commit to encryption4all/postguard-website that referenced this pull request May 18, 2026
…228)

Sets X-Cryptify-Source on the PostGuard SDK so cryptify (≥0.1.27+,
encryption4all/cryptify#102) classifies uploads from this site
deterministically rather than via the Origin-host fallback.

Origin-based detection collided with the Outlook add-in, which is
served from addin.*.postguard.eu — that host matches cryptify's
`contains("postguard.")` rule and shadows the User-Agent "outlook"
check. With the explicit header, the website always reports
`channel="website"` regardless of the deploy host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant