fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions by tlangton3 · Pull Request #10855 · cube-js/cube

tlangton3 · 2026-05-11T14:39:02Z

Summary

multi_stage: true measures (rank, window aggregates) configured with reduce_by: [<time_dim>] silently produce wrong results when the consumer query specifies a granularity — every row reports rank() = 1 (or, for sums, the per-row value rather than the cross-time aggregate). Because the emitted SQL is well-formed, the bug bypasses validation and surfaces only as broken BI dashboards (a "rank by gross profit across all days for this district" tile showing every district at rank 1).

The workaround is to query the time dimension without an explicit granularity — sidestepping the suffix mismatch entirely. That constrains every consumer to query at the raw timestamp grain, eliminating natural day/week/month groupings on the time dim that's being reduced by, which is a real loss of expressiveness.

Root cause is a string-equality mismatch in has_member_in_reference_chain: the projected time dim resolves as a TimeDimensionSymbol whose full_name carries a granularity suffix (_day/_month/etc., defaulting to _day when no granularity is set — see TimeDimensionSymbol::new), while the reduce_by entry resolves to a base DimensionSymbol with the bare name. The compare never matched, so the time dim was never excluded from PARTITION BY and the window collapsed to one-row partitions.

Originally surfaced against BigQuery in production; the fix is backend-agnostic (lives in the Tesseract planner before SQL is emitted) and the included regression tests run against the existing testcontainers Postgres harness. Fixes #10854.

Changes

rust/cube/cubesqlplanner/cubesqlplanner/src/planner/planners/multi_stage/member_query_planner.rs: introduce a single matches_partition_member(d, m) helper used at both the reduce_by and group_by filter sites in member_partition_by_logical. The helper branches on whether m carries an explicit granularity (see Implementation Details).
rust/cube/cubesqlplanner/cubesqlplanner/src/test_fixtures/schemas/yaml_files/common/integration_multi_stage.yaml: add an amount_reduce_created_at measure (sum, reduce_by: [orders.created_at]) to the existing orders cube — fixture for the canonical regression test.
rust/cube/cubesqlplanner/cubesqlplanner/src/tests/integration/multi_stage/reduce_by.rs (+ snapshot): add test_reduce_by_time_dim exercising the new measure with granularity: month.
rust/cube/cubesqlplanner/cubesqlplanner/src/tests/integration/multi_stage/group_by.rs (+ snapshot): add test_group_by_time_dim_multiple_granularities mirroring the BigECommerce multi-stage shape (group_by single granularity, query projecting multiple granularities of the same time dim).

Implementation Details

The two operators have semantically distinct intents that depend on the granularity of the reduce_by / group_by entry — not on which operator it is. matches_partition_member(d, m) captures this with one rule, applied uniformly at both call sites:

If m carries an explicit granularity (resolves to a TimeDimensionSymbol with has_granularity() == true — i.e. the user wrote <time_dim>.month / .quarter / etc.), match strictly on full_name. A group_by: [<time_dim>.quarter] matches only the _quarter projection, not sibling granularities. Symmetrically, a reduce_by: [<time_dim>.month] excludes only the _month projection.
If m is bare (plain DimensionSymbol from reduce_by: [<time_dim>], or TimeDimensionSymbol with granularity = None), unwrap both sides to their base symbol and compare. This is the fix for the granularity-suffix mismatch from Tesseract: multi-stage reduce_by silently no-ops against type:time dimensions #10854: bare reduce_by: [<time_dim>] matches and excludes any granularity of the projection.

TimeDimensionSymbol::full_name carries a granularity suffix that defaults to "_day" when no granularity is supplied (see TimeDimensionSymbol::new); without the base unwrap, the bare entry's full_name could never match a granularity-suffixed projection.

An earlier draft of this patch applied symmetric base_of(...) unwrap to both call sites unconditionally. That was too permissive on the group_by side: when a query projected multiple granularities of the same time dim, both projections unwrapped to the same base, both matched the explicit-granularity group_by entry, both stayed in PARTITION BY, and the window aggregate degenerated to per-row sums. CI caught this against BigECommerce's multi-stage group by time dimension test in both postgres-driver and mssql-driver runs.

Testing

cargo test --features integration-postgres --lib 'tests::integration' → 421 passed, 0 failed (testcontainers Postgres).
cargo test --lib -p cubesqlplanner → 901 passed, 0 failed.
cargo fmt -p cubesqlplanner -- --check → clean.
test_reduce_by_time_dim (new) — bare reduce_by: [created_at] + granularity: month. Verified to fail (per-(status, month) sums) without the fix, passes (cross-month per-status totals: completed=1400, pending=650, cancelled=200) with the fix.
test_group_by_time_dim_multiple_granularities (new) — group_by: [status, created_at.month] + query projecting both month and week. Verified to fail (per-(month, week) sums) under the symmetric-unwrap-on-group_by version, passes (per-month sums 500/750/1000 repeated across weeks) under the unified helper.
Existing test_group_by_with_time (single-granularity projection) stays green.

`multi_stage: true` measures (rank, window aggregates) configured with `reduce_by: [<time_dim>]` silently produced wrong results when the consumer query specified a granularity (e.g. `granularity: day`). The projected dim resolved as a TimeDimensionSymbol whose `full_name` carried a granularity suffix, while the reduce_by entry resolved to a base DimensionSymbol with bare name — the string-equality compare in `has_member_in_reference_chain` failed, the time dim stayed in PARTITION BY, and every row collapsed to a one-row partition with rank() = 1. Fix: unwrap any TimeDimensionSymbol to its base DimensionSymbol on both sides of the comparison in `member_partition_by_logical`. Symmetric unwrapping is required because `group_by` entries can themselves be TimeDimensionSymbols (e.g. `group_by: [<time_dim>.month]`); unwrapping only the projected dim would break the symmetric-granularity match and silently drop the time dim from PARTITION BY in cases like `test_group_by_with_time`. Adds a regression test (`test_reduce_by_time_dim`) using the existing multi-stage fixture: a sum measure with `reduce_by: [orders.created_at]` queried with `granularity: month`. Without the fix the snapshot shows per-(status, month) sums; with the fix every row for a given status shows the same per-status grand total across all months. Fixes cube-js#10854.

Replace member_partition_by_logical's separate reduce_by/group_by matching with a single matches_partition_member helper that branches on whether the measure-level entry (m) carries an explicit granularity: - m has explicit granularity (e.g. <time_dim>.quarter): match strictly on full_name. group_by: [<time_dim>.quarter] keeps only the _quarter projection in PARTITION BY; reduce_by: [<time_dim>.month] excludes only the _month projection. - m is bare (plain DimensionSymbol, or TimeDimensionSymbol with granularity = None): unwrap both sides to base. reduce_by: [<time_dim>] excludes the time dim regardless of the projection's granularity suffix — the original cube-js#10854 case. This replaces the symmetric base_of() unwrap on the group_by site, which over-matched when a query projected multiple granularities of the same time dim and degenerated window aggregates to per-row sums. CI surfaced the regression against BigECommerce's "multi-stage group by time dimension" test (both postgres-driver and mssql-driver runs). Adds test_group_by_time_dim_multiple_granularities mirroring the BigECommerce shape against cubesqlplanner's testcontainers Postgres harness — without the fix, sums collapse per-(month, week); with the fix, they correctly aggregate per-month repeated across weeks.

github-actions Bot added rust Pull requests that update Rust code pr:community Contribution from Cube.js community members. labels May 11, 2026

style(tesseract): cargo fmt

1ae2a6f

vercel Bot deployed to Preview May 11, 2026 14:43 View deployment

ovr requested a review from waralexrom May 11, 2026 15:19

tlangton3 changed the title ~~fix(tesseract): multi_stage reduce_by/group_by no-ops on time dimensions~~ fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions May 11, 2026

vercel Bot deployed to Preview May 11, 2026 15:27 View deployment

tlangton3 marked this pull request as ready for review May 11, 2026 15:51

tlangton3 mentioned this pull request May 11, 2026

Multi-stage measure order_by template crashes when queried through a view that doesn't expose the referenced member by its bare name #10856

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions#10855

fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions#10855
tlangton3 wants to merge 3 commits into
cube-js:masterfrom
tlangton3:cube/fix-tesseract-multistage-reduce-by-time-dim

tlangton3 commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlangton3 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Implementation Details

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tlangton3 commented May 11, 2026 •

edited

Loading