task(content analytics) #34395 : Implement the Engagement SQL structures#34512
Open
jcastro-dotcms wants to merge 1 commit intomainfrom
Open
task(content analytics) #34395 : Implement the Engagement SQL structures#34512jcastro-dotcms wants to merge 1 commit intomainfrom
jcastro-dotcms wants to merge 1 commit intomainfrom
Conversation
2 tasks
1 task
1 task
freddyDOTCMS
approved these changes
Feb 5, 2026
| /* Partitioning note: | ||
| We partition by a hash of (customer, cluster) to spread writes and merges. | ||
| This avoids a single giant partition for big tenants and keeps merges parallelizable. */ | ||
| PARTITION BY sipHash64(customer_id, cluster_id) % 64 |
Contributor
There was a problem hiding this comment.
I think we should add the date for this partition maybe "min_ts_state"
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed Changes
Implements the complete engagement analytics SQL structures and data pipeline for dotCMS content analytics, enabling GA4-style engagement metrics tracking at scale.
Changes Overview
This PR introduces a multi-layered analytics pipeline in ClickHouse that computes session engagement metrics efficiently through incremental aggregation and daily rollups.
Database Layer (ClickHouse)
New Pipeline Architecture:
Core Tables & Views Added:
-
session_states(AggregatingMergeTree) - Incremental, mergeable session states-
session_states_mv- Real-time MV aggregating events into session states- Supports late-arriving events through merge semantics
-
session_facts(ReplacingMergeTree) - Finalized session snapshots with engagement flags-
session_facts_rmv- Refreshable MV (every 15min) finalizing sessions from last 72 hours- Computes engagement flag based on: duration > 10s OR pageviews >= 2 OR conversions >= 1
-
engagement_daily- Daily engagement KPIs (total/engaged sessions, durations, event counts)-
sessions_by_device_daily- Sessions by device category (Desktop/Mobile/Tablet/Other)-
sessions_by_browser_daily- Sessions by browser family (Chrome/Safari/Firefox/Edge/Other)-
sessions_by_language_daily- Sessions by language ID- All with corresponding refreshable MVs recomputing last 90 days
-
device_category_map- User-agent to device category mapping-
device_category_fallback_rules- Priority-ordered fallback heuristics for device detection-
browser_family_map- User-agent to browser family mapping-
browser_family_fallback_rules- Priority-ordered fallback heuristics for browser detectionCubeJS Schema Layer
New Cubes:
EngagementDaily(docker/docker-compose-examples/analytics/setup/config/dev/cube/schema/EngagementDaily.js)- Measures: engagement rate, conversion rate, avg interactions, avg session time
- Dimensions: customer_id, cluster_id, context_site_id, day
- Enables KPI cards and trend charts
SessionsByDeviceDaily(docker/docker-compose-examples/analytics/setup/config/dev/cube/schema/SessionsByDeviceDaily.js)- Measures: total/engaged sessions, engagement rate within device, avg engaged session time
- Dimensions: device_category (Desktop/Mobile/Tablet/Other)
SessionsByBrowserDaily(docker/docker-compose-examples/analytics/setup/config/dev/cube/schema/SessionsByBrowserDaily.js)- Measures: total/engaged sessions, engagement rate within browser, avg engaged session time
- Dimensions: browser_family (Chrome/Safari/Firefox/Edge/Other)
SessionsByLanguageDaily(docker/docker-compose-examples/analytics/setup/config/dev/cube/schema/SessionsByLanguageDaily.js)- Measures: total/engaged sessions, engagement rate within language, avg engaged session time
- Dimensions: language_id (dotCMS language ID as String)
Updated:
cube.js- Added new cubes to security whitelistEventSummary.js- Fixed filter params to use correct cube name (was referencingContentAttributionincorrectly)Key Features
Engagement Rules (GA4-aligned)
A session is marked as "engaged" if ANY of:
Performance Optimizations
Multi-tenant & Multi-cluster Support
All tables scoped by:
Files Changed
Test Plan
This PR fixes: #34395