C1: Create tfact_grade — course run grade fact table#2259
Conversation
Implements C1 from epic #2073. Consolidates course run grade records from MITx Online, MITx Pro, and edX.org into a new incremental fact table in the dimensional layer. - Filters int__mitxonline__courserun_grades to courserun_platform = 'MITx Online' to avoid double-counting edX.org-hosted MITx Online course grades - edX.org uses courserungrade_user_grade (different column name) and a surrogate grade_id from (user_id, courserun_readable_id) - Incremental watermark via pre-computed CTE to avoid Trino fan-out - Defensive ROW_NUMBER dedup guard (QUALIFY not supported in Trino) - Full YAML column docs and FK relationship tests in _fact_tables.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new incremental transaction fact table tfact_grade to the dimensional star schema, consolidating course run grades from MITx Online, MITx Pro, and edX.org. The model follows the patterns established by tfact_certificate and tfact_enrollment: per-platform CTEs are unioned, joined to dim_user/dim_course_run/dim_platform, and incrementally loaded via delete+insert keyed on a surrogate grade_key. A pre-computed per-platform watermark CTE avoids correlated-subquery fan-out, and a ROW_NUMBER dedup guard substitutes for Trino's missing QUALIFY.
Changes:
- New model
tfact_grade.sqlwith platform-aware FK joins and an incremental watermark per platform. - New YAML entry in
_fact_tables.ymldescribing the table, grain, FK relationships, and column-level tests. - MITx Online source filtered to
courserun_platform = 'MITx Online'to prevent double-counting edX.org-hosted MITx Online grades.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/ol_dbt/models/dimensional/tfact_grade.sql | New incremental fact model unioning mitxonline/mitxpro/edxorg grades with FK lookups, watermarked incremental filter, and ROW_NUMBER dedup. |
| src/ol_dbt/models/dimensional/_fact_tables.yml | Documents tfact_grade columns, relationships, and tests, including unique combination of (grade_id, platform). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rachellougee
left a comment
There was a problem hiding this comment.
dbt build fine but a bit slow. I left a comment but not a blocker.
| on ( | ||
| -- mitxonline/mitxpro: join on integer source_id + platform | ||
| (combined_grades.platform in ('mitxonline', 'mitxpro') | ||
| and combined_grades.courserun_id = dim_course_run.source_id | ||
| and combined_grades.platform = dim_course_run.platform) | ||
| -- edxorg: join on readable_id + platform to prevent fan-out across platforms | ||
| or (combined_grades.platform = 'edxorg' | ||
| and combined_grades.courserun_readable_id = dim_course_run.courserun_readable_id | ||
| and dim_course_run.platform = 'edxorg') | ||
| ) |
There was a problem hiding this comment.
No need to change it now, but the OR clause in the dim_course_run join here has a performance cost . This was likely copied from tfact_enrollment and tfact_certificate which have the same pattern.
All three platforms could join dim_course_run on (courserun_readable_id, platform) if we populate courserun_readable_id in above CTEs, the same natural key used to build courserun_pk, eliminating the OR entirely.
What are the relevant tickets?
Closes #2125
Epic: #2073
Description (What does it do?)
Adds
tfact_grade— a new incremental transaction fact table in the dimensional star schema.Grain: One row per (user × course run) grade record.
Platforms covered: MITx Online, MITx Pro, edX.org
Source models:
int__mitxonline__courserun_grades— filtered tocourserun_platform = 'MITx Online'to exclude edX.org-hosted MITx Online course grades (which are captured via the edxorg source and would otherwise be double-counted)int__mitxpro__courserun_gradesint__edxorg__mitx_courserun_grades— usescourserungrade_user_grade(column name differs from other platforms); timestamps unavailable sograde_date_keyandgrade_created_onare null for edX.org rows;grade_idis a surrogate from(user_id, courserun_readable_id)Key implementation details:
delete+insertongrade_key; watermark computed via a pre-computed CTE per platform (avoids Trino correlated-subquery fan-out anti-pattern, consistent withtfact_certificateandtfact_enrollment)delete+insertis idempotent)ROW_NUMBERdedup guard (Trino does not supportQUALIFY)dim_user,dim_course_run(is_current = true),dim_platform,dim_datemicromasters_program_id IS NOT NULL); a separate MicroMasters platform source can be added in a future phaseHow can this be tested?
cd src/ol_dbt && ol-dbt run --select tfact_grade(ordbt compile --select tfact_grade)ol-dbt run --select tfact_grade --target dev_qadbt test --select tfact_gradeAdditional Context
The platform-filtering fix for MITx Online (filtering to
courserun_platform = 'MITx Online') was identified during a pre-implementation rubber duck review.int__mitxonline__courserun_gradescontains grades for both MITx Online-hosted and edX.org-hosted course runs; without this filter, edX.org course grades would appear twice in the fact table (once from mitxonline, once from edxorg).This model is a new leaf node — it consumes existing intermediates and has no impact on any existing downstream models (confirmed via OpenMetadata lineage analysis).