Skip to content

chore: bump sqlglot pin to sqlglot[c]==30.7.0#101

Merged
suryaiyer95 merged 1 commit intomainfrom
feat/migrate-sqlglot-c-v30
May 7, 2026
Merged

chore: bump sqlglot pin to sqlglot[c]==30.7.0#101
suryaiyer95 merged 1 commit intomainfrom
feat/migrate-sqlglot-c-v30

Conversation

@suryaiyer95
Copy link
Copy Markdown
Collaborator

Summary

Bump the sqlglot install requirement from sqlglot~=25.30.0 to
sqlglot[c]==30.7.0 to align with the pin being adopted in
AltimateAI/altimate-dags#1024 (sqlglot[c] v30 — 6.8x parsing speedup).

SqlCheck (src/datapilot/core/platforms/dbt/insights/sql/sql_check.py)
is the only place in datapilot-cli that imports sqlglot. It uses the
optimizer rules (pushdown_projections, normalize, unnest_subqueries,
eliminate_subqueries, eliminate_joins, eliminate_ctes) and inspects
each rule's signature via inspect.getfullargspec(rule).args. Verified
that sqlglot[c] 30.7.0 still exposes proper signatures via mypyc — no
compat shim is needed.

Non-breaking validation

Check sqlglot 25.30.0 (old pin) sqlglot[c] 30.7.0 (new pin)
pytest tests/ 77 passed 77 passed
inspect.getfullargspec on optimizer rules OK OK
Pipeline rule execution (parse → qualify → 6 optimizer rules) 0 errors 0 errors
SqlCheck insights on a 7-query fixture 6 insights 6 insights, byte-identical recommendations

The 7-query fixture covered filter pushdown, CTE chains, unused joins,
IN-subqueries, SELECT DISTINCT dedup, complex OR/AND filters, and a
no-optimization-needed baseline. Output recommendations are character-
identical between versions.

Why exact pin (==30.7.0) and not a range

Matches the version validated in altimate-dags#1024 against 1,986,940
production queries (and re-validated today on 214,259 rakuten queries:
99.9991 % byte-identical SQL fingerprints, 0 false negatives, 0 false
positives). Holding the pin tight ensures DAG workers and the
datapilot-cli installed alongside them resolve to the same sqlglot
release; we can relax to a range later if/when 30.x minor bumps are
similarly validated.

Refs: AltimateAI/altimate-dags#1024

Test plan

  • CI green
  • After merge, cut a new datapilot-cli release and bump the version
    pinned in altimate-infra airflow extraPipPackages
    (apps/{prod,staging/airflow-mi,freemium}/airflow/values.yml).

🤖 Generated with Claude Code

Match the version being adopted in altimate-dags PR #1024 so all
downstream consumers (`altimate-dags`, airflow workers via
`altimate-datapilot-cli`) resolve to the same C-accelerated sqlglot
release. The 6.8x parsing speedup landing in altimate-dags benefits
any DAG path that reaches into datapilot-cli's `SqlCheck`.

`SqlCheck` (`src/datapilot/core/platforms/dbt/insights/sql/sql_check.py`)
is the only place that imports `sqlglot`. Its optimizer-rule pipeline
(`pushdown_projections`, `normalize`, `unnest_subqueries`,
`eliminate_subqueries`, `eliminate_joins`, `eliminate_ctes`) inspects
each rule via `inspect.getfullargspec(rule).args`. Verified that
sqlglot[c] 30.7.0 still exposes proper signatures via mypyc — no
compat shim required.

Non-breaking validation:
- Existing test suite: **77/77 pass** on both 25.30.0 (old pin) and
  30.7.0 (new pin).
- SqlCheck output is byte-identical between 25.30.0 and 30.7.0 on a
  7-query fixture covering filter pushdown, CTE chains, unused joins,
  IN-subqueries, DISTINCT dedup, OR/AND filters, and a no-op query.
  Same 6 insights, same recommended optimized SQL, same rule names.

Refs: AltimateAI/altimate-dags#1024
@suryaiyer95 suryaiyer95 merged commit 29acf7b into main May 7, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants