Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test-warehouse.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: "3.9"
python-version: "3.10"
cache: "pip"

- name: Install Spark requirements
Expand Down
12 changes: 12 additions & 0 deletions macros/utils/cross_db_utils/incremental_strategy.sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,15 @@
{% macro default__get_default_incremental_strategy() %}
{% do return(none) %}
{% endmacro %}

{% macro get_append_only_incremental_strategy() %}
{% do return(adapter.dispatch("get_append_only_incremental_strategy", "elementary")()) %}
{% endmacro %}

{%- macro bigquery__get_append_only_incremental_strategy() %}
{% do return("merge") %}
{% endmacro %}

{% macro default__get_append_only_incremental_strategy() %}
{% do return("append") %}
{% endmacro %}
Comment on lines +21 to +31
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find all usages of get_append_only_incremental_strategy in the codebase
rg "get_append_only_incremental_strategy" -A 3 -B 3

Repository: elementary-data/dbt-data-reliability

Length of output: 1724


🏁 Script executed:

# Also check the context of incremental_strategy.sql to understand the broader pattern
cat -n macros/utils/cross_db_utils/incremental_strategy.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 1203


🏁 Script executed:

# Search for where incremental_strategy is configured or set in the dbt project
fd -t f -e sql -e yml | xargs rg "incremental_strategy" | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 3335


🏁 Script executed:

# Check test_result_rows.sql for unique_key configuration
cat -n models/edr/run_results/test_result_rows.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 1177


🏁 Script executed:

# Search for all config blocks in that file to see if unique_key is defined
rg -A 10 "config(" models/edr/run_results/test_result_rows.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 144


🌐 Web query:

dbt incremental_strategy none default behavior merge fallback

💡 Result:

  • In dbt, if you don’t set incremental_strategy (i.e., it’s effectively None), dbt calls the adapter’s default incremental SQL macro (get_incremental_default_sql). The actual behavior is adapter-specific (it is not universally merge). [1], [2]
  • Many warehouse adapters set their default to merge (for example: Snowflake and BigQuery). [3], [4]
  • “Merge fallback” commonly happens via unique_key rules:
    • If you configure a strategy like merge / delete+insert / microbatch but omit unique_key, dbt generally falls back to append. [5]
    • BigQuery is an exception: merge on BigQuery requires unique_key; leaving it out can be ambiguous or fail rather than cleanly falling back. [5]

So: incremental_strategy: None ⇒ “use the adapter default,” which may be merge on some platforms, but not all; and separately, missing unique_key can force/trigger an append fallback (with BigQuery caveats). [3], [4], [5]

Sources:
[1] dbt-labs/docs.getdbt.com#1761
[2] https://docs.getdbt.com/docs/build/incremental-strategy
[3] https://docs.getdbt.com/reference/resource-configs/snowflake-configs
[4] https://docs.getdbt.com/reference/resource-configs/bigquery-configs
[5] https://docs.getdbt.com/reference/resource-configs/unique_key


🏁 Script executed:

# Verify if there are any comments or documentation in the file about why append-only strategy is used
cat -n models/edr/run_results/test_result_rows.sql | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 1177


🌐 Web query:

dbt incremental_strategy append vs merge difference unique_key requirement

💡 Result:

append vs merge in dbt incremental models

append

  • Behavior: Insert-only — dbt runs an INSERT of the incremental query results into the target table; it does not update or delete existing rows.
  • Duplicates: dbt doesn’t check for existing rows, so duplicates can accumulate if the same records appear again. [2]
  • unique_key: Not required (and not used for upserts). If you don’t set unique_key, dbt/your adapter effectively uses append behavior by default. [1]

merge

  • Behavior: Upsert — dbt updates existing rows and inserts new rows by matching on unique_key (SCD Type 1–like overwrite semantics). [2]
  • Duplicates: With a valid unique_key, merge helps prevent duplicates by updating matched keys instead of inserting again. [2]
  • unique_key: Required for true merge/upsert behavior (it’s the match key). If you configure merge without unique_key, it behaves like append (adapter fallback). [1], [2]
    • BigQuery note: for incremental_strategy='merge', you must provide unique_key (otherwise behavior is ambiguous or can fail). [1]

Rule of thumb

  • Use append when rows are immutable (event/log-style) and duplicates are acceptable/handled downstream. [2]
  • Use merge when you need the target table to reflect the latest version per key (upserts). [2]

Sources: dbt docs on unique_key and incremental strategies. [1], [2]


Default macro returns none, causing non-Athena adapters to rely on adapter-specific defaults without explicit intent.

test_result_rows.sql (the only consumer of this macro) lacks a unique_key and is designed for append-only behavior. Returning none defers to adapter defaults—which vary (Snowflake and BigQuery typically default to merge)—rather than explicitly stating the append-only strategy. This creates implicit coupling to adapter behavior and ambiguity about incremental semantics.

Return "append" explicitly to match the macro's name and intent across all adapters:

🔧 Suggested fix
 {% macro default__get_append_only_incremental_strategy() %}
-  {% do return(none) %}
+  {% do return("append") %}
 {% endmacro %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% macro get_append_only_incremental_strategy() %}
{% do return(adapter.dispatch("get_append_only_incremental_strategy", "elementary")()) %}
{% endmacro %}
{%- macro athena__get_append_only_incremental_strategy() %}
{% do return("append") %}
{% endmacro %}
{% macro default__get_append_only_incremental_strategy() %}
{% do return(none) %}
{% endmacro %}
{% macro get_append_only_incremental_strategy() %}
{% do return(adapter.dispatch("get_append_only_incremental_strategy", "elementary")()) %}
{% endmacro %}
{%- macro athena__get_append_only_incremental_strategy() %}
{% do return("append") %}
{% endmacro %}
{% macro default__get_append_only_incremental_strategy() %}
{% do return("append") %}
{% endmacro %}
🤖 Prompt for AI Agents
In `@macros/utils/cross_db_utils/incremental_strategy.sql` around lines 21 - 31,
The default implementation of get_append_only_incremental_strategy currently
returns none (default__get_append_only_incremental_strategy), which lets
non-Athena adapters fall back to their own defaults (e.g., merge) and causes
test_result_rows.sql to behave inconsistently; change
default__get_append_only_incremental_strategy to explicitly return the string
"append" so get_append_only_incremental_strategy (and
athena__get_append_only_incremental_strategy) uniformly indicate append-only
semantics across all adapters.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually fine with this, as in practice we update this table via post-hook / on-run-end hook, and the model query itself is always empty.
I just need it not to fail - which is why I have the setting for Athena.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! The macro approach ensures Athena gets the explicit append strategy it needs while other adapters use their defaults. Ready to merge whenever you'd like - just note that the CI will continue to fail until merged due to the Python 3.9 → 3.10 chicken-and-egg issue with the reusable workflow.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped: comment is from another GitHub bot.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually I have doubts, maybe I'll do as you say

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haritamar Thanks for reconsidering! I think returning "append" explicitly is the safer choice here, even though the model query is currently empty. Here's why:

Benefits of explicit "append":

  1. Semantic clarity – The macro is named get_append_only_incremental_strategy, so returning "append" matches its documented intent
  2. Decouples from adapter defaults – You won't be implicitly relying on varied adapter behavior (Snowflake/BigQuery default to merge, others may differ)
  3. Future-proof – If the model query ever becomes non-empty or hooks change, the append-only semantics are already guaranteed
  4. Consistency – All adapters (including Athena) will explicitly use the same append-only strategy

Current risk with none:

  • Redshift, Snowflake, and BigQuery default to merge when strategy is unspecified
  • While they may fall back to append when unique_key is missing, this fallback behavior isn't guaranteed across all adapter versions and can be ambiguous

The suggested fix in my previous comment explicitly returns "append" for the default implementation, which aligns the behavior across all adapters with the macro's intended purpose. Let me know if you'd like to proceed with that change!


🧠 Learnings used
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/cross_db_utils/incremental_strategy.sql:13-15
Timestamp: 2025-10-05T16:25:01.719Z
Learning: In macros/utils/cross_db_utils/incremental_strategy.sql, the get_default_incremental_strategy() macro and its adapter-specific variants are used exclusively for Elementary's internal dbt artifact tables, all of which have unique keys defined. Therefore, defaulting to "merge" strategy for Redshift (and other adapters) is safe in this context.

Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/table_operations/delete_and_insert.sql:138-146
Timestamp: 2025-08-10T11:29:06.982Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers to add the `escape_reserved_keywords` macro only in places where they've actually encountered errors with reserved keywords, rather than preemptively adding it everywhere. They rely on E2E tests to catch future issues if reserved keywords cause problems in untested code paths. This pragmatic approach avoids unnecessary defensive programming and keeps the codebase simpler.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the macro, but:

  1. Make it append for everything other than BigQuery (which doesn't support it)
  2. For BigQuery - merge

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Updated the macro:

  • Default now returns 'append' for all adapters
  • BigQuery-specific override returns 'merge' (since BigQuery doesn't support append)

Pushed commit 9cd962a.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped: comment is from another GitHub bot.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

BigQuery dbt incremental strategy append support

💡 Result:

Yes—BigQuery supports dbt’s append incremental strategy.

  • append is the default incremental behavior when you don’t specify an incremental_strategy (and typically when you don’t set unique_key). dbt will simply INSERT INTO the target table. This does not update/delete existing rows and can create duplicates. [1], [2]
  • You can also explicitly set it:
{{ config(materialized='incremental', incremental_strategy='append') }}

select ...
{% if is_incremental() %}
  where ...
{% endif %}
  • On BigQuery, if you want updates/upserts you generally use merge (requires unique_key), or partition-replacement strategies like insert_overwrite / microbatch (partition-based, unique_key not required). [2], [3]

Sources:
[1] dbt docs: Incremental strategies (append) https://docs.getdbt.com/docs/build/incremental-strategy
[2] dbt docs: unique_key defaulting behavior and relationship to append https://docs.getdbt.com/reference/resource-configs/unique_key
[3] dbt-bigquery incremental strategy behavior overview (adapter implementation summary) https://deepwiki.com/dbt-labs/dbt-bigquery/3.2-incremental-materialization-strategies


@haritamar, I checked the dbt-bigquery documentation and found that BigQuery does support the append incremental strategy—it's actually the default behavior when no strategy is specified.

However, I want to clarify your requirements before implementing:

Potential concern: Setting incremental_strategy='merge' for BigQuery typically requires a unique_key, which test_result_rows.sql doesn't have. Without a unique_key, the behavior may fall back to append anyway, or could cause unexpected behavior.

Questions:

  1. Did you encounter a specific issue with append on BigQuery for this model?
  2. Since the model query is empty (per your earlier comment), does the strategy choice actually affect BigQuery's behavior in practice?

If you still want to proceed with merge for BigQuery (perhaps for consistency with other BigQuery models or to match your conventions), I'm happy to implement it. Just want to make sure we're solving the right problem!

Would you like me to:

  • A) Implement as requested: "append" for default, "merge" for BigQuery
  • B) Use "append" uniformly across all adapters (including BigQuery)
  • C) Something else?

🧠 Learnings used
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/cross_db_utils/incremental_strategy.sql:13-15
Timestamp: 2025-10-05T16:25:01.719Z
Learning: In macros/utils/cross_db_utils/incremental_strategy.sql, the get_default_incremental_strategy() macro and its adapter-specific variants are used exclusively for Elementary's internal dbt artifact tables, all of which have unique keys defined. Therefore, defaulting to "merge" strategy for Redshift (and other adapters) is safe in this context.

Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 918
File: macros/utils/cross_db_utils/incremental_strategy.sql:21-31
Timestamp: 2026-01-27T21:01:09.183Z
Learning: In `models/edr/run_results/test_result_rows.sql`, the model query is always empty and the table is updated via post-hook/on-run-end hook. The `incremental_strategy` configuration using `get_append_only_incremental_strategy()` is only needed to prevent Athena from failing with its default `insert_overwrite` strategy. For other adapters, returning `none` (to use adapter defaults) is acceptable because the empty query doesn't trigger incremental logic.

Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/table_operations/delete_and_insert.sql:138-146
Timestamp: 2025-08-10T11:29:06.982Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers to add the `escape_reserved_keywords` macro only in places where they've actually encountered errors with reserved keywords, rather than preemptively adding it everywhere. They rely on E2E tests to catch future issues if reserved keywords cause problems in untested code paths. This pragmatic approach avoids unnecessary defensive programming and keeps the codebase simpler.

2 changes: 1 addition & 1 deletion models/edr/run_results/test_result_rows.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"prev_timestamp_column": "detected_at",
},
table_type=elementary.get_default_table_type(),
incremental_strategy=elementary.get_default_incremental_strategy()
incremental_strategy=elementary.get_append_only_incremental_strategy()
)
}}

Expand Down
Loading