Skip to content

B3: Fix tfact_enrollment incremental watermark to capture status updates#2258

Merged
quazi-h merged 3 commits into
mainfrom
copilot/2123-fix-tfact-enrollment-incremental-watermark
Jun 1, 2026
Merged

B3: Fix tfact_enrollment incremental watermark to capture status updates#2258
quazi-h merged 3 commits into
mainfrom
copilot/2123-fix-tfact-enrollment-incremental-watermark

Conversation

@quazi-h
Copy link
Copy Markdown
Contributor

@quazi-h quazi-h commented May 28, 2026

What are the relevant tickets?

Closes #2123
Epic: #2073

Description (What does it do?)

The incremental filter for tfact_enrollment used only enrollment_created_on as the watermark for platforms that have no updated_on column (edxorg, residential). This silently missed status changes such as enrollment deactivations (is_active: true → false) and mode upgrades (audit → verified).

Platforms that already expose an updated_on timestamp (MITxOnline, MITxPro, program enrollments) use the existing coalesce(enrollment_updated_on, enrollment_created_on) watermark and are unaffected.

The fix adds a targeted 7-day lookback clause scoped to rows where enrollment_updated_on is null:

-- For platforms without updated_on (edxorg, residential), apply a 7-day lookback to catch
-- status changes (deactivations, mode upgrades) that don't bump enrollment_created_on
or (ewf.enrollment_updated_on is null and ewf.enrollment_created_on >= current_timestamp - interval '7' day)

Source audit (confirmed via OpenMetadata):

  • raw__mitx__openedx__mysql__student_courseenrollment (residential) — columns: id, mode, created, user_id, course_id, is_active — no updated column in source
  • stg__edxorg__bigquery__mitx_person_course (edxorg) — BigQuery snapshot, no updated_on

How can this be tested?

Run the model incrementally against production and verify that deactivated enrollments for edxorg/residential are correctly captured:

uv run dbt run --select tfact_enrollment --vars 'schema_suffix: qhoque' --target dev_production

Then spot-check that a known deactivated residential or edxorg enrollment (where enrollment_is_active = false and enrollment_created_on predates the previous watermark) now appears in the output:

select platform, enrollment_is_active, enrollment_mode, enrollment_created_on, enrollment_updated_on
from ol_warehouse_production_qhoque_dimensional.tfact_enrollment
where platform in ('edxorg', 'residential')
  and enrollment_is_active = false
order by enrollment_created_on desc
limit 20;

Additional Context

tfact_enrollment is a leaf node with no downstream dbt models (confirmed via OpenMetadata lineage). The delete+insert incremental strategy with unique_key='enrollment_key' safely handles re-processing of rows within the lookback window.

Copilot AI review requested due to automatic review settings May 28, 2026 17:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts tfact_enrollment’s incremental filter so platforms that lack an updated_on source field (edxorg, residential) still reprocess a small recent window and don’t miss enrollment status/mode changes.

Changes:

  • Adds a 7-day lookback condition for rows where enrollment_updated_on is null in the incremental WHERE clause.
  • Keeps existing coalesce(enrollment_updated_on, enrollment_created_on) watermark behavior for platforms that do have updated_on.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ol_dbt/models/dimensional/tfact_enrollment.sql Outdated
quazi-h added a commit that referenced this pull request May 28, 2026
… filter

The 7-day lookback clause was comparing enrollment_created_on (ISO8601
varchar) directly to current_timestamp (timestamp), which errors in
Trino with a type mismatch.

Use the cross-db from_iso8601_timestamp() macro — renders as
from_iso8601_timestamp(...||'Z') on Trino and cast(... as timestamp)
on DuckDB. Also adds an explicit IS NOT NULL guard so the cast is never
attempted on null values.

Addresses Copilot review feedback on PR #2258.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@quazi-h quazi-h requested a review from Copilot May 28, 2026 17:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread src/ol_dbt/models/dimensional/tfact_enrollment.sql
@KatelynGit KatelynGit self-assigned this May 29, 2026
Copy link
Copy Markdown
Contributor

@KatelynGit KatelynGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build ran clean and it does appear to be picking up additional records now

quazi-h and others added 3 commits June 1, 2026 13:03
…watermark

For platforms without an updated_on column (edxorg, residential), the
incremental filter used only enrollment_created_on as the watermark,
silently missing status changes such as deactivations (is_active: true
→ false) and mode upgrades (audit → verified).

Platforms with updated_on (MITxOnline, MITxPro, program enrollments)
already use the per-platform coalesce(updated_on, created_on) watermark
and are unaffected.

The new clause reprocesses the trailing 7 days of rows where
enrollment_updated_on is null, which is the standard lookback pattern
for sources that carry no mutable timestamp.

Closes #2123

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… filter

The 7-day lookback clause was comparing enrollment_created_on (ISO8601
varchar) directly to current_timestamp (timestamp), which errors in
Trino with a type mismatch.

Use the cross-db from_iso8601_timestamp() macro — renders as
from_iso8601_timestamp(...||'Z') on Trino and cast(... as timestamp)
on DuckDB. Also adds an explicit IS NOT NULL guard so the cast is never
attempted on null values.

Addresses Copilot review feedback on PR #2258.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@quazi-h quazi-h force-pushed the copilot/2123-fix-tfact-enrollment-incremental-watermark branch from e8f7913 to 3a49ddb Compare June 1, 2026 17:03
@quazi-h quazi-h merged commit 6d20ede into main Jun 1, 2026
5 checks passed
@quazi-h quazi-h deleted the copilot/2123-fix-tfact-enrollment-incremental-watermark branch June 1, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

B3: Fix tfact_enrollment incremental watermark to capture status updates

3 participants