feat(segment_membership): Daily Snowflake-backed per-env segment counts#7464
feat(segment_membership): Daily Snowflake-backed per-env segment counts#7464
Conversation
Backfills identities from Dynamo to Snowflake daily, then refreshes
per-(segment, environment) match counts in the new `SegmentMembership`
cache. The translator from `flagsmith-sql-flag-engine` turns each
canonical segment into a SQL `WHERE` predicate; counts are
materialised as `COUNT(*) ... GROUP BY environment_id` per segment.
The serializer surfaces them as a list of `{environment, count,
last_synced_at}`, ready to back per-env count badges in the
Identities-tab environment dropdown.
Pipeline shape:
- `backfill_identities_to_snowflake` is the daily recurring task
(`timeout=4h` to fit large environments). After backfilling each
project's environments it dispatches one
`refresh_project_segment_counts(project_id)` per project so the
count refresh always sees the freshly backfilled snapshot rather
than racing a separate schedule.
- `refresh_project_segment_counts` opens its own Snowpark session,
re-checks the FoF flag at execution time so a stale fan-out skips
orgs that have since been disabled, and bulk-upserts via Postgres
`ON CONFLICT` (single statement per project).
- `compute_segment_counts_for_project` returns a list of unsaved
`SegmentMembership` instances; the task stamps `last_synced_at`
consistently across the batch. Untranslatable segments emit a
structlog `compute.segment.skipped` error event so we hear about
predicate gaps rather than silently dropping rows.
Both tasks short-circuit when SNOWFLAKE_* env vars are unset and
skip per-organisation when the `segment_membership_inspection`
Flagsmith-on-Flagsmith flag is False, so SaaS rolls out gradually
and self-hosted is unaffected.
DELETE-then-INSERT runs without an explicit transaction. Snowflake
holds micropartition locks for the lifetime of an open transaction,
and at 10M+ identities a BEGIN/COMMIT around the whole env partition
would keep that lock open for minutes. Per-statement implicit
commits leave a brief mid-refresh window where readers see an empty
partition; acceptable under the FoF flag's gradual rollout.
Backfill writes via Snowpark DataFrames against the canonical
IDENTITIES schema, with `DynamoIdentity` documents projected through
`segment_membership.mappers.map_identity_document_to_snowflake_row`.
Refresh issues a single batched UNION ALL using parameterised SQL —
env keys are bound, predicates from the engine are already escape-
safe. Schema setup is a `RunPython` migration gated on
`is_snowflake_configured()`, so it no-ops on self-hosted and in the
test suite.
The segment serializer surfaces cached counts via a new `memberships`
list field; absence of an entry is the read-side signal, no flag
check on the read path. `SegmentMembershipSerializer` gives
drf-spectacular a typed schema. Adds a generic `batched` helper to
`api/util/util.py` for the per-INSERT batching.
beep boop
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
Docker builds report
|
…ps prefetch
The new `prefetch_related("memberships")` adds one IN-clause query per
list response, even when no rows exist. Update the regression
expectations so the existing test suite reflects the new baseline.
beep boop
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7464 +/- ##
==========================================
+ Coverage 98.44% 98.45% +0.01%
==========================================
Files 1398 1410 +12
Lines 52654 53117 +463
==========================================
+ Hits 51834 52297 +463
Misses 820 820 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… pre-release Switches the api dep from a private-repo git URL — which the Docker build can't clone in CI — to a versioned pin against Flagsmith's staging CodeArtifact PyPI (`flagsmith-pypi-staging`, account 302456015006, eu-west-2). Initial published release: 0.1.0a1. The reusable docker-build workflow now unconditionally assumes the OIDC role `arn:aws:iam::302456015006:role/codeartifact-github-actions-staging` (trust policy allows any `repo:Flagsmith/*`), fetches an authorisation token, and exposes it to every build as the `codeartifact_token` BuildKit secret. Builds that don't mount the secret simply ignore it; the OIDC + token cost is a couple of seconds per build. `Dockerfile`'s four `make install*` lines mount the `codeartifact_token` secret and export `POETRY_HTTP_BASIC_FLAGSMITH_PYPI_STAGING_*` so poetry resolves the dep from CodeArtifact. The header documents the `--secret="id=codeartifact_token,env=..."` incantation for local builds. beep boop
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Failed testsfirefox › tests/environment-permission-test.pw.ts › Environment Permission Tests › Environment-level permissions control access to features, identities, and segments @enterprise Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
|
Visual Regression16 screenshots compared. See report for details. |
…fact The unit-test, MCP-schema-push, makefile-target, and update-flagsmith workflows all run `make install-packages`, which now needs CodeArtifact credentials to resolve the `flagsmith-sql-flag-engine` pre-release. Encapsulate the OIDC role assumption + token fetch in a composite action, reuse it from the Docker build workflow, and wire it into every workflow that runs poetry install. beep boop
c6e464b to
bff85b4
Compare
CodeQL flagged the MD5 truncation as a sensitive-data hashing risk. UUIDv4 already gives us the random bits we need for a dedup key, so take the high 64 bits directly via int.from_bytes and drop the hash. beep boop
0bd838f to
aa45090
Compare
Adds four global Prometheus metrics covering the daily Dynamo→Snowflake backfill and the per-project count refresh: identities mirrored, per-environment backfill duration, refresh duration, and refresh failures. Metrics are global — env/project labels would blow Prometheus cardinality at SaaS scale. Snowpark sessions now carry a QUERY_TAG for spend attribution, set via Snowpark's `session.query_tag` setter. Backfill tags by org+project per env iteration; refresh tags by org+project. Spend grouped by tag is queryable from Snowflake's QUERY_HISTORY for 365 days. beep boop
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Contributes to Segment Membership Inspection.
Adds a daily pipeline that backfills Dynamo identities into Snowflake, materialises per-(segment, environment) match counts via
flagsmith-sql-flag-engine, and exposes them on the segment endpoint asmemberships: [{environment, count, last_synced_at}]for env-dropdown badges. Gated behind the org-scopedsegment_membership_inspectionFoF flag; no-ops whenSNOWFLAKE_*env vars are unset.Review complexity: 4/5 — three datastores, two new runtime deps, new Django app with recurring + handler tasks. Pulled down by an FoF flag, additive on the read path.
Review order:
models.py(cache table) →services.py(compile + count, parameterised SQL) →tasks.py(daily recurring backfill fans out per-project refresh) →mappers.py(Dynamo doc → IDENTITIES row) →migrations/0002_*(Snowflake DDLRunPython, no-op when unconfigured) →segments/serializers.py+views.py(read-sidemembershipsfield, prefetched).How did you test this code?
36 unit tests + 2 integration tests; 100% coverage on
segment_membership/.make lintandmake typecheck(mypy strict) green.