Fix 9 bugs found during codebase audit by nicosuave · Pull Request #75 · sidequery/sidemantic

nicosuave · 2026-01-29T15:29:06Z

Summary

FULL OUTER JOIN null safety: Replace COALESCE(x,'')=COALESCE(y,'') with IS NOT DISTINCT FROM via sqlglot NullSafeEQ to handle integer/date dimensions
Symmetric aggregate overflow: Increase multipliers to 1e12 with overflow-safe types per dialect (Postgres ::numeric, Snowflake NUMBER(38,0), DuckDB HUGEINT<<40)
Conversion metric dimensions: Add SELECT, GROUP BY, and JOIN condition support so conversion rates can be sliced by dimensions without cross-group leaking
Fan-out detection all hops: Check entire join path for one_to_many, not just first hop (catches A->B(m2o)->C(o2m))
Cross-model filter partitioning: Push single-model filters into sub-queries, apply shared/metric-level filters on outer query via sqlglot AST rewriting (not string replacement)
Preagg count word-boundary regex: "discount_amount" no longer falsely matches as a count measure
Min/max/median symmetric aggregates: Min/max pass through (idempotent to fan-out), median raises with guidance
Duplicate segment resolution: Pass segments=None after resolving to filters to prevent double-application
Inheritance model_fields_set: Child can explicitly set a field to None to clear parent value (was silently inheriting due to exclude_none=True)

16 regression tests in tests/test_bugfix_batch.py with DuckDB execution covering all 9 fixes.

1. FULL OUTER JOIN: Replace COALESCE(x,'')=COALESCE(y,'') with IS NOT DISTINCT FROM (NullSafeEQ) to handle non-string types 2. Symmetric aggregate multipliers: Increase to 1e12 with overflow-safe types (Postgres numeric, Snowflake NUMBER(38,0), DuckDB HUGEINT<<40) 3. Conversion metrics: Add dimension support (SELECT, GROUP BY, JOIN condition with IS NOT DISTINCT FROM to prevent cross-group leaking) 4. Fan-out detection: Check all hops in join path, not just first hop 5. Cross-model filters: Partition filters per model for sub-queries, apply shared filters on outer query via sqlglot AST rewriting 6. Preagg count matching: Use word-boundary regex to avoid matching substrings like "discount_amount" 7. Symmetric aggregates: Add min/max passthrough, median error message 8. Duplicate segments: Pass segments=None after resolving to filters 9. Inheritance: Use model_fields_set instead of exclude_none so child can explicitly set a field to None to clear parent value Includes 16 regression tests in tests/test_bugfix_batch.py covering all fixes with actual DuckDB execution where applicable.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c212d551ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-29T15:34:06Z

sidemantic/sql/generator.py

            # again because each sub-query has metrics from only one model
            sub_query = self.generate(
                metrics=model_metrics,
                dimensions=dimensions,
-                filters=all_filters,
+                filters=model_filters,


Keep shared-filter columns in preagg CTE output

Shared filters are no longer passed into each model’s subquery (filters=model_filters), but the CTE builder only projects columns needed by its own filters. That means any column referenced by a cross-model/shared filter (e.g., orders.status = 'completed' AND items.category = 'A', or a metric-level filter that stays in shared_filters) won’t be selected into orders_preagg/items_preagg. When the outer WHERE later applies the shared filter, those columns are missing, yielding invalid SQL unless the column also happened to be a selected dimension. Consider projecting shared-filter columns into each model CTE or splitting shared filters into per-model predicates before dropping them from filters.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-29T15:34:06Z

sidemantic/sql/generator.py

+        if dim_cols:
+            base_col_list = ",\n    ".join(f"{c} AS {c}" for c in dim_cols)
+            extra_base_cols = f",\n    {base_col_list}"


Use dimension SQL/granularity for conversion dims

The new conversion-dimension handling builds extra_*_cols using raw dimension names (f"{c} AS {c}"). This skips each Dimension.sql_expr and ignores time granularities (e.g., event_date__day), so queries with derived/time dimensions will reference non-existent columns and won’t apply DATE_TRUNC/expressions. That yields SQL errors or incorrect grouping for common conversion slices. The conversion query should use the same dimension SQL/granularity resolution as the main generator.

Useful? React with 👍 / 👎.

…sion SQL P1: Split compound AND filters into individual conjuncts before classifying for pushdown. A filter like "orders.status = 'active' AND items.category = 'A'" is now decomposed so each part gets pushed into its respective model sub-query, avoiding the need to project extra columns through preagg CTEs. P2: Conversion metric dimensions now use dim.sql_expr (the actual SQL expression) and apply DATE_TRUNC for time granularity, matching the main generator's behavior. Previously used raw dimension names which would fail for derived dimensions (e.g., name="region" sql="server_region") or time granularity (event_date__month). Adds 3 regression tests: compound AND decomposition, conversion with DATE_TRUNC granularity, and conversion with derived dimension SQL expression.

chatgpt-codex-connector bot reviewed Jan 29, 2026

View reviewed changes

nicosuave merged commit 0a9ac82 into main Jan 29, 2026
10 checks passed

nicosuave deleted the fix-9-codebase-audit-bugs branch January 29, 2026 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 9 bugs found during codebase audit#75

Fix 9 bugs found during codebase audit#75
nicosuave merged 2 commits intomainfrom
fix-9-codebase-audit-bugs

nicosuave commented Jan 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 29, 2026

Uh oh!

chatgpt-codex-connector bot Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicosuave commented Jan 29, 2026

Summary

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant