Conversation
1. FULL OUTER JOIN: Replace COALESCE(x,'')=COALESCE(y,'') with IS NOT DISTINCT FROM (NullSafeEQ) to handle non-string types 2. Symmetric aggregate multipliers: Increase to 1e12 with overflow-safe types (Postgres numeric, Snowflake NUMBER(38,0), DuckDB HUGEINT<<40) 3. Conversion metrics: Add dimension support (SELECT, GROUP BY, JOIN condition with IS NOT DISTINCT FROM to prevent cross-group leaking) 4. Fan-out detection: Check all hops in join path, not just first hop 5. Cross-model filters: Partition filters per model for sub-queries, apply shared filters on outer query via sqlglot AST rewriting 6. Preagg count matching: Use word-boundary regex to avoid matching substrings like "discount_amount" 7. Symmetric aggregates: Add min/max passthrough, median error message 8. Duplicate segments: Pass segments=None after resolving to filters 9. Inheritance: Use model_fields_set instead of exclude_none so child can explicitly set a field to None to clear parent value Includes 16 regression tests in tests/test_bugfix_batch.py covering all fixes with actual DuckDB execution where applicable.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c212d551ac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # again because each sub-query has metrics from only one model | ||
| sub_query = self.generate( | ||
| metrics=model_metrics, | ||
| dimensions=dimensions, | ||
| filters=all_filters, | ||
| filters=model_filters, |
There was a problem hiding this comment.
Keep shared-filter columns in preagg CTE output
Shared filters are no longer passed into each model’s subquery (filters=model_filters), but the CTE builder only projects columns needed by its own filters. That means any column referenced by a cross-model/shared filter (e.g., orders.status = 'completed' AND items.category = 'A', or a metric-level filter that stays in shared_filters) won’t be selected into orders_preagg/items_preagg. When the outer WHERE later applies the shared filter, those columns are missing, yielding invalid SQL unless the column also happened to be a selected dimension. Consider projecting shared-filter columns into each model CTE or splitting shared filters into per-model predicates before dropping them from filters.
Useful? React with 👍 / 👎.
sidemantic/sql/generator.py
Outdated
| if dim_cols: | ||
| base_col_list = ",\n ".join(f"{c} AS {c}" for c in dim_cols) | ||
| extra_base_cols = f",\n {base_col_list}" |
There was a problem hiding this comment.
Use dimension SQL/granularity for conversion dims
The new conversion-dimension handling builds extra_*_cols using raw dimension names (f"{c} AS {c}"). This skips each Dimension.sql_expr and ignores time granularities (e.g., event_date__day), so queries with derived/time dimensions will reference non-existent columns and won’t apply DATE_TRUNC/expressions. That yields SQL errors or incorrect grouping for common conversion slices. The conversion query should use the same dimension SQL/granularity resolution as the main generator.
Useful? React with 👍 / 👎.
…sion SQL P1: Split compound AND filters into individual conjuncts before classifying for pushdown. A filter like "orders.status = 'active' AND items.category = 'A'" is now decomposed so each part gets pushed into its respective model sub-query, avoiding the need to project extra columns through preagg CTEs. P2: Conversion metric dimensions now use dim.sql_expr (the actual SQL expression) and apply DATE_TRUNC for time granularity, matching the main generator's behavior. Previously used raw dimension names which would fail for derived dimensions (e.g., name="region" sql="server_region") or time granularity (event_date__month). Adds 3 regression tests: compound AND decomposition, conversion with DATE_TRUNC granularity, and conversion with derived dimension SQL expression.
Summary
COALESCE(x,'')=COALESCE(y,'')withIS NOT DISTINCT FROMvia sqlglotNullSafeEQto handle integer/date dimensions::numeric, SnowflakeNUMBER(38,0), DuckDBHUGEINT<<40)one_to_many, not just first hop (catches A->B(m2o)->C(o2m))"discount_amount"no longer falsely matches as a count measuresegments=Noneafter resolving to filters to prevent double-applicationNoneto clear parent value (was silently inheriting due toexclude_none=True)16 regression tests in
tests/test_bugfix_batch.pywith DuckDB execution covering all 9 fixes.