B3: Fix tfact_enrollment incremental watermark to capture status updates#2258
Merged
quazi-h merged 3 commits intoJun 1, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adjusts tfact_enrollment’s incremental filter so platforms that lack an updated_on source field (edxorg, residential) still reprocess a small recent window and don’t miss enrollment status/mode changes.
Changes:
- Adds a 7-day lookback condition for rows where
enrollment_updated_on is nullin the incremental WHERE clause. - Keeps existing
coalesce(enrollment_updated_on, enrollment_created_on)watermark behavior for platforms that do haveupdated_on.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
quazi-h
added a commit
that referenced
this pull request
May 28, 2026
… filter The 7-day lookback clause was comparing enrollment_created_on (ISO8601 varchar) directly to current_timestamp (timestamp), which errors in Trino with a type mismatch. Use the cross-db from_iso8601_timestamp() macro — renders as from_iso8601_timestamp(...||'Z') on Trino and cast(... as timestamp) on DuckDB. Also adds an explicit IS NOT NULL guard so the cast is never attempted on null values. Addresses Copilot review feedback on PR #2258. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
KatelynGit
approved these changes
May 29, 2026
Contributor
KatelynGit
left a comment
There was a problem hiding this comment.
build ran clean and it does appear to be picking up additional records now
…watermark For platforms without an updated_on column (edxorg, residential), the incremental filter used only enrollment_created_on as the watermark, silently missing status changes such as deactivations (is_active: true → false) and mode upgrades (audit → verified). Platforms with updated_on (MITxOnline, MITxPro, program enrollments) already use the per-platform coalesce(updated_on, created_on) watermark and are unaffected. The new clause reprocesses the trailing 7 days of rows where enrollment_updated_on is null, which is the standard lookback pattern for sources that carry no mutable timestamp. Closes #2123 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… filter The 7-day lookback clause was comparing enrollment_created_on (ISO8601 varchar) directly to current_timestamp (timestamp), which errors in Trino with a type mismatch. Use the cross-db from_iso8601_timestamp() macro — renders as from_iso8601_timestamp(...||'Z') on Trino and cast(... as timestamp) on DuckDB. Also adds an explicit IS NOT NULL guard so the cast is never attempted on null values. Addresses Copilot review feedback on PR #2258. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
e8f7913 to
3a49ddb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What are the relevant tickets?
Closes #2123
Epic: #2073
Description (What does it do?)
The incremental filter for
tfact_enrollmentused onlyenrollment_created_onas the watermark for platforms that have noupdated_oncolumn (edxorg, residential). This silently missed status changes such as enrollment deactivations (is_active: true → false) and mode upgrades (audit → verified).Platforms that already expose an
updated_ontimestamp (MITxOnline, MITxPro, program enrollments) use the existingcoalesce(enrollment_updated_on, enrollment_created_on)watermark and are unaffected.The fix adds a targeted 7-day lookback clause scoped to rows where
enrollment_updated_on is null:Source audit (confirmed via OpenMetadata):
raw__mitx__openedx__mysql__student_courseenrollment(residential) — columns:id, mode, created, user_id, course_id, is_active— noupdatedcolumn in sourcestg__edxorg__bigquery__mitx_person_course(edxorg) — BigQuery snapshot, noupdated_onHow can this be tested?
Run the model incrementally against production and verify that deactivated enrollments for edxorg/residential are correctly captured:
uv run dbt run --select tfact_enrollment --vars 'schema_suffix: qhoque' --target dev_productionThen spot-check that a known deactivated residential or edxorg enrollment (where
enrollment_is_active = falseandenrollment_created_onpredates the previous watermark) now appears in the output:Additional Context
tfact_enrollmentis a leaf node with no downstream dbt models (confirmed via OpenMetadata lineage). Thedelete+insertincremental strategy withunique_key='enrollment_key'safely handles re-processing of rows within the lookback window.