Skip to content

chore(tesseract): replace SqlNode trait with Op pipeline#10848

Open
waralexrom wants to merge 11 commits into
masterfrom
tesseract-sql-evaluator-mutability
Open

chore(tesseract): replace SqlNode trait with Op pipeline#10848
waralexrom wants to merge 11 commits into
masterfrom
tesseract-sql-evaluator-mutability

Conversation

@waralexrom
Copy link
Copy Markdown
Member

Summary

Replaces the Rc<dyn SqlNode> decorator chain in the Tesseract SQL evaluator with a data-driven Op pipeline. Plans are now Vec<Op> carried by a concrete NodeProcessor; ops are exhaustive enum variants with per-variant OpExec impls. The motivation is to make the pipeline editable at runtime (mutating a Vec<Op> instead of threading magic flags through SqlEvaluatorVisitor) and to enable structural validation, snapshot diffing and future optimization passes.

waralexrom and others added 8 commits May 8, 2026 16:26
Add an Op-based pipeline alongside the existing Rc<dyn SqlNode>
decorator chain so SQL evaluator nodes can be migrated to a
data-driven, inspectable representation incrementally.

- Add `op/` submodule with `OpExec` trait, `OpCtx`, `Op` enum, exhaustive
  dispatch and `OpPipelineSqlNode` bridge so a `Vec<Op>` plugs into the
  legacy tree wherever a `Rc<dyn SqlNode>` is expected.
- Add `LegacySqlNodeOp` escape hatch to wrap not-yet-migrated nodes
  during the transition; it goes away once all nodes are migrated.
- Migrate `EvaluateSqlNode` and `ParenthesizeSqlNode` to Ops; remove the
  legacy files.
- Add `Op::evaluate_symbol()`, `Op::parenthesize()`, `Op::legacy()`
  free constructors to keep call sites readable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rReferences to Op

Batch 1 of the SqlNode → Op migration: simple decorators with no
multi-pipeline branching. Each legacy node becomes an Op variant with
its `OpExec` impl living in its own file under `op/`.

- `AutoPrefixOp`, `GeoDimensionOp`, `MeasureFilterOp`, `RenderReferencesOp`
  with matching `Op::auto_prefix(...)`, `Op::geo_dimension()`,
  `Op::measure_filter()`, `Op::render_references(...)` constructors.
- Factory call sites switch to small `op_*` helpers that wrap the
  remaining legacy SqlNode tail via `Op::legacy(...)`.
- `render_references.rs` keeps the data types (`RenderReferences`,
  `RenderReferencesType`, `RawReferenceValue`) used elsewhere; the
  `SqlNode` impl is gone.
- Drop `auto_prefix.rs`, `geo_dimension.rs`, `measure_filter.rs`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batch 2 of the SqlNode → Op migration: masking, case definitions and
top-level kind dispatch.

- `MaskedOp` keeps the same business logic (substitute `mask:` SQL when
  the user lacks access; CASE-guard with the optional mask filter).
  Filter rendering uses a new `OpCtx::tail_as_sql_node()` to materialize
  the unmasked tail as a `SqlNode`, so member references inside the
  filter resolve through it without recursing back through MaskedOp.
- `CaseOp` mirrors the legacy `case:` / `case_switch:` rendering with
  the same shortcut behaviour for degenerate switches.
- `DispatchByKindOp` replaces `RootSqlNode`: dimension / time dimension
  / measure / default each carry their own `Vec<Op>` pipeline, and the
  op picks one based on the symbol kind.
- `Op` and all variants get `#[derive(Clone)]` to support
  `tail_as_sql_node`.
- `OpCtx::render_pipeline` lifetime relaxed: a fresh sub-pipeline can
  live for any lifetime ≤ outer ctx, so kind-dispatch branches can be
  borrowed straight from the holding op.
- Drop `case.rs`, `masked.rs`, `root_processor.rs`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batch 3 of the SqlNode → Op migration: the four wrappers around the
final measure aggregation step.

- `FinalMeasureOp` applies the regular aggregate (`SUM`/`AVG`/`COUNT
  DISTINCT`/HLL state init) with the right form for joins that produce
  multiplied rows.
- `FinalPreAggregationMeasureOp` swaps the aggregate for the equivalent
  rollup over a pre-aggregation column when one is available
  (`sum(state)`, `merge(state)`, etc.).
- `UngroupedMeasureOp` and `UngroupedQueryFinalMeasureOp` cover the two
  ungrouped flavors — the former passes the expression through at the
  measure level, the latter wraps count-likes in
  `CASE WHEN <expr> IS NOT NULL THEN 1 END` for downstream summation.
- Factory `final_measure_node_processor` switches to op_* helpers.
- Drop `final_measure.rs`, `final_pre_aggregation_measure.rs`,
  `ungroupped_measure.rs`, `ungroupped_query_final_measure.rs`.
- Update one stale class-name reference in a test comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final batch of the SqlNode → Op migration, covering the time-related
wrappers and the multi-pipeline measure ops.

- `TimeDimensionOp`, `TimeShiftOp`, `CalendarTimeShiftOp` handle
  timezone lifting, multi-stage interval shifts and calendar-cube
  shifts.
- `MultiStageRankOp` renders multi-stage rank measures as
  `RANK() OVER (PARTITION BY … ORDER BY …)`; non-rank measures fall
  through to the rest of the pipeline.
- `MultiStageWindowOp` and `RollingWindowOp` are the first ops with
  multiple sub-pipelines: each branch is its own `Vec<Op>` carried in
  the op variant, picked at exec time. This is the structural
  branching shape the Op model was introduced for — kind-dispatch and
  multi-input wrappers now hold their branches as data.
- Factory call sites switch to op_* helpers; legacy node imports drop
  out entirely.
- Drop `time_dimension.rs`, `time_shift.rs`, `calendar_time_shift.rs`,
  `multi_stage_rank.rs`, `multi_stage_window.rs`, `rolling_window.rs`.

After this batch every former `*SqlNode` has an Op counterpart; only
the migration scaffolding (`OpPipelineSqlNode`, `Op::LegacySqlNode`)
and the `SqlNode` trait itself remain to be removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce an `is_terminal` bit on `OpExec` plus a recursive
`Op::validate_pipeline` that enforces a simple invariant: a pipeline
must be non-empty, end with exactly one terminal op, and contain no
terminals before that. Sub-pipelines carried by branching ops
(`DispatchByKind`, `MultiStageWindow`, `RollingWindow`) are validated
recursively.

Validation runs at use time in `OpPipelineSqlNode::to_sql` and the
result is cached via `OnceCell` so a pipeline reused across many
renders pays the cost once. A bad plan surfaces as `CubeError::internal`
through the regular error path — no panics. Running at use time also
covers future flows where pipelines are mutated (e.g. ops removed at
runtime instead of toggled via context flags).

- `EvaluateSymbol`, `DispatchByKind`, `MultiStageWindow`,
  `RollingWindow`, `LegacySqlNode` mark themselves terminal.
- The three branching ops expose a `pub(super) nested_pipelines()`
  accessor used by the validator.
- Add a doc note on `MultiStageWindowOp` and `RollingWindowOp` matching
  the one already on `DispatchByKindOp` ("Discards the tail — each
  branch is a self-contained pipeline").
- 5 inline tests in `op_enum.rs` cover the full grammar (empty,
  missing terminal, terminal in the middle, valid linear, nested
  invalid).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final step of the SqlNode → Op migration. With every legacy node now
backed by an Op variant, the trait machinery and bridges go away.

- `NodeProcessor` (renamed from `OpPipelineSqlNode`) becomes the public
  carrier of a `Vec<Op>`: a concrete `Rc`-shareable struct with an
  inherent `to_sql` method, no trait, no `dyn`. Validation still runs
  once on first render via `OnceCell`.
- Bulk-replace `Rc<dyn SqlNode>` with `Rc<NodeProcessor>` across the
  planner — visitor, `MemberSqlContext`, `SqlCall::eval`,
  `Filter::to_sql`, `cube_ref_evaluator`, filter operators, the
  physical_plan_builder, and tests.
- Factory rewritten on flat `Vec<Op>`: helpers `op_*` prepend their op
  in front of the inbound pipeline, `default_node_processor` wraps the
  final vector in `Rc<NodeProcessor>::new(...)` once. No more nested
  `OpPipelineSqlNode → Op::legacy(child)` towers.
- Drop `SqlNode` trait (`sql_node.rs`), `Op::LegacySqlNode` variant +
  `LegacySqlNodeOp`, the `OpPipelineSqlNode` bridge, and orphaned
  `cube_calc_groups.rs` / `original_sql_pre_aggregation.rs`.
- Rename `OpCtx::legacy_node_processor` → `node_processor` and
  `tail_as_sql_node` → `tail_as_node_processor`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframe the public docstrings around what each item is, not how it sits
relative to call sites or migration history.

- `OpCtx` and its methods (`render_pipeline`, `tail_as_node_processor`,
  `with_visitor`) drop "used by ops that …" prose; the new wording
  describes the state and behavior directly.
- Add missing docs on the `OpExec` trait, the `SqlNodesFactory` struct
  and its `default_node_processor` entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waralexrom waralexrom requested a review from paveltiunov May 8, 2026 22:30
@github-actions github-actions Bot added the rust Pull requests that update Rust code label May 8, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @waralexrom's task in 3m 43s —— View job


Code Review: Replace SqlNode trait with Op pipeline

  • Gather context and understand PR changes
  • Review core Op infrastructure (enum, exec trait, processor)
  • Review individual Op implementations
  • Review factory and pipeline construction
  • Review integration points (visitor, filters, etc.)
  • Check test coverage
  • Post comprehensive review

Overall Assessment

This is a well-executed structural refactor. Replacing Rc<dyn SqlNode> decorator chains with a data-driven Vec<Op> pipeline is the right architectural move. The net -414 lines, elimination of the as_any/childs boilerplate, and structural validation are all clear wins. The migration is faithful — I verified the factory's pipeline construction against the old code and the behavioral mapping is 1:1.

Strengths

  • Clean separation of concerns. Each Op is a self-contained file with a single OpExec impl. The old trait objects mixed introspection (as_any, childs) with rendering logic; this is strictly better.
  • Structural validation (Op::validate_pipeline) is a real improvement. Catching "pipeline ends without a terminal" at construction time rather than at render time is valuable. The recursive validation of nested pipelines in DispatchByKind/MultiStageWindow/RollingWindow is thorough.
  • Good test coverage for validation. The 5 unit tests in op_enum.rs cover empty, non-terminal-only, terminal-in-middle, valid linear, and recursive nested cases. The existing 215 integration tests in the tests/ directory exercise the full pipeline.
  • OpCtx design. The render_tail/render_pipeline/with_visitor/tail_as_node_processor API is well-thought-out and makes the chain-of-responsibility pattern explicit via slice indexing rather than implicit via Rc nesting.

Issues & Suggestions

I've left 6 inline comments on specific files. The highlights:

  1. time_shift_op.rs:32.unwrap() on shift.interval can panic. Pre-existing from the old code, but this is a good opportunity to convert to a proper CubeError. (Low severity, easy fix)

  2. node_processor.rs:36-39OnceCell validation caching retries on every call if validation fails. Consider validating eagerly in new() to eliminate the OnceCell and guarantee a NodeProcessor is always structurally valid. (Minor)

  3. op_enum.rs:20 — Missing Debug derive on Op. The PR summary mentions snapshot diffing and diagnostics as motivation — Debug on Op (and its inner structs) would directly enable that. (Minor, high value)

  4. op_enum.rs:145-193 — Two parallel match blocks for exec/is_terminal dispatch must be kept manually in sync with the enum variants. A macro or enum_dispatch would eliminate this class of drift. (Maintenance observation, not blocking)

  5. factory.rs:261Vec<Op> cloning replaces the old Rc refcount bumps. Fine at current pipeline sizes (~2-8 ops), but worth noting for future awareness if pipelines grow.

Behavioral Correctness

I verified the following behavioral equivalences against the old code:

Area Old New Match?
Inner masked wrapper MaskedSqlNode::new(EvaluateSqlNode::new()) op_masked(vec![Op::evaluate_symbol()], false)
Outer masked wrapper (ungrouped) MaskedSqlNode::new_ungrouped(…) op_masked(…, true)
Outer masked wrapper (grouped) MaskedSqlNode::new(…) op_masked(…, false)
Dimension processor chain TimeShift → CalendarTimeShift → TimeDimension → Paren → AutoPrefix → Case → Geo → EvaluateSQL Same order via prepend
Measure processor chain MultiStageRank → MultiStageWindow → Masked → Final → Ungrouped → Case → MeasureFilter → Paren → AutoPrefix → Masked → EvaluateSQL Same order
RootSqlNode dispatch 4-way: dimension, time_dimension, measure, default DispatchByKindOp with same 4 branches
OriginalSqlPreAggregation removal Was in old factory Handled by CubeRefEvaluator (unchanged)

Security

No security concerns. The changes are purely internal refactoring of SQL rendering pipeline composition. All SQL generation still flows through PlanSqlTemplates for proper quoting/escaping. No new external inputs are introduced.

Minor Nits (Non-blocking)

  • geo_dimension_op.rs:31&vec![…] allocates unnecessarily; &[latitude_str, …] would suffice. Pre-existing from old code.
  • Several ops that are always unit structs (CaseOp, MeasureFilterOp, GeoDimensionOp, etc.) could be #[derive(Clone, Copy)] for slightly cheaper semantics, though it doesn't matter in practice.

Verdict

Looks good overall. The only item I'd suggest fixing before merge is the .unwrap() in time_shift_op.rs. The rest are improvements that can land in follow-ups.

Comment thread rust/cube/cubesqlplanner/cubesqlplanner/src/physical_plan/sql_nodes/op/op_enum.rs Outdated
Comment thread rust/cube/cubesqlplanner/cubesqlplanner/src/physical_plan/sql_nodes/op/op_enum.rs Outdated
waralexrom and others added 2 commits May 9, 2026 00:41
- `MultiStageWindowOp::exec`: rewrite the gating condition from
  `!(a && !b)` to `!a || b` (`clippy::nonminimal_bool`).
- `Op` enum doc: rephrase the "Adding a new op = …" sentence so no
  doc-comment line starts with `+` and trips
  `clippy::doc_lazy_continuation`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from PR review.

- Derive `Debug` (and `Clone`) for every Op variant, the `Op` enum, and
  `NodeProcessor`. `CalendarDimensionTimeShift` and the
  `RenderReferences*` types pick up `Debug` so the propagation works.
  Plan dumps and snapshot diffs now produce useful output.
- Replace the two parallel match tables in `OpExec for Op` with a
  single `define_op_enum!` macro. Variants are declared once at the
  bottom of the file; the macro expands the enum, the derives, and the
  forwarded `exec` / `is_terminal` impls. Adding a new op stays a
  one-line change and the two dispatch arms can no longer drift.
- `TimeShiftOp::exec` no longer unwraps `shift.interval`; a missing
  interval produces a `CubeError::internal` instead of a panic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waralexrom waralexrom closed this May 8, 2026
@waralexrom waralexrom reopened this May 8, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @waralexrom's task in 4m 36s —— View job


Code Review: Replace SqlNode trait with Op pipeline

  • Gather context and understand full PR diff
  • Review core Op infrastructure (op_enum.rs, op_exec.rs, op_ctx.rs, node_processor.rs)
  • Review individual Op implementations (all 18 new *_op.rs files)
  • Review factory and pipeline construction (factory.rs)
  • Review integration points (visitor, filters, render_references)
  • Check test coverage and validation
  • Post comprehensive review

Overall Assessment

This is a well-executed structural refactor — looks good to merge. The replacement of Rc<dyn SqlNode> decorator chains with a data-driven Vec<Op> pipeline is architecturally sound. The net result is -434 lines, elimination of the as_any/childs boilerplate on every node, and the addition of structural validation that catches malformed pipelines at construction time rather than at render time.

Architecture

The new design has clean layering:

Layer Responsibility
OpExec trait Single method per op — exec + is_terminal
Op enum Exhaustive dispatch via define_op_enum! macro — adding a new op is a one-line change
OpCtx Render state carrier with render_tail/render_pipeline/with_visitor/tail_as_node_processor
NodeProcessor Rc-shareable pipeline carrier, entry point via to_sql
SqlNodesFactory Pipeline assembly from configuration — prepend-based builder

The define_op_enum! macro (introduced in commit 6690d67) is a good call — it eliminates the two parallel match blocks that the earlier review flagged and ensures exec/is_terminal dispatch can't drift from the enum variants.

Behavioral Correctness

I verified the factory's pipeline construction against the old code. The mapping is 1:1:

  • Dimension chain: TimeShift → CalendarTimeShift → TimeDimension → Paren → AutoPrefix → Case → Geo → EvaluateSymbol ✅
  • Measure chain: MultiStageRank → MultiStageWindow → Masked → Final → Ungrouped → Case → MeasureFilter → Paren → AutoPrefix → Masked → EvaluateSymbol ✅
  • Root dispatch: 4-way (dimension, time_dimension, measure, default) → DispatchByKindOp
  • OriginalSqlPreAggregation removal: Handled by CubeRefEvaluator (unchanged) ✅
  • CubeCalcGroups removal: Clean — no remaining references anywhere in the codebase ✅

Strengths

  1. Macro-driven dispatchdefine_op_enum! generates both exec and is_terminal from a single variant list, making it impossible for them to fall out of sync
  2. #[derive(Clone, Debug)] on Op and all inner structs — enables snapshot diffing and diagnostic logging, which the PR summary mentions as a future goal
  3. Structural validation (Op::validate_pipeline) with recursive nested pipeline checking is thorough — 5 unit tests cover empty, non-terminal-only, terminal-in-middle, valid linear, and recursive cases
  4. Smart Rc wrapping in ops with large data (AutoPrefixOp, CalendarTimeShiftOp, FinalMeasureOp, RenderReferencesOp, TimeDimensionOp) — avoids deep-cloning HashMap/HashSet when the Op itself is cloned
  5. The .unwrap() from the old time_shift.rs was correctly replaced with a proper CubeError::internal in time_shift_op.rs:32-36 — good fix

Issues & Suggestions

I left 7 inline comments. Summary:

File Severity Issue
node_processor.rs:37-40 Minor OnceCell validation retries on failure — consider eager validation in new()
op_enum.rs:202-209 Minor nested_pipelines free fn is a parallel dispatch table that can drift — consider making it part of OpExec
geo_dimension_op.rs:31 Nit &vec![…]&[…] avoids unnecessary heap allocation (pre-existing)
masked_op.rs:50-56 Info render_tail + tail_as_node_processor both consume the same tail — double evaluation when mask filter is present
factory.rs:260-263 Info Vec<Op> cloning replaces old Rc refcount bumps — fine now, flag for future
factory.rs:336-337 Info Same observation for the measure pipeline input.clone()
case_op.rs:122-131 Minor Hard error on non-dimension/measure symbols — defensive fallthrough might be more future-proof

Security

No security concerns. All changes are purely internal refactoring of the SQL rendering pipeline composition. SQL generation still flows through PlanSqlTemplates for proper quoting/escaping. No new external inputs are introduced, and the masking logic in MaskedOp faithfully reproduces the old MaskedSqlNode behavior.

Testing

  • 5 unit tests in op_enum.rs cover structural validation edge cases
  • 215 integration tests across the tests/ directory exercise the full pipeline end-to-end
  • 17 auto-parenthesization tests in auto_parentheses.rs verify the ParenthesizeOp + visitor flag interaction
  • The existing integration tests are the primary correctness guarantee — if the behavioral mapping drifted, snapshot mismatches would surface

Verdict

Looks good. The only item I'd suggest fixing before merge is the OnceCell → eager validation in NodeProcessor::new (cleaner invariant, removes a field). The nested_pipelines drift risk is worth addressing in a follow-up. Everything else is observational.


Comment thread rust/cube/cubesqlplanner/cubesqlplanner/src/physical_plan/sql_nodes/op/op_enum.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.84%. Comparing base (8d8d2a9) to head (c6fb073).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #10848   +/-   ##
=======================================
  Coverage   78.84%   78.84%           
=======================================
  Files         470      470           
  Lines       92274    92274           
  Branches     3433     3433           
=======================================
+ Hits        72751    72755    +4     
+ Misses      19020    19016    -4     
  Partials      503      503           
Flag Coverage Δ
cube-backend 58.22% <ø> (ø)
cubesql 83.47% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Replace the free `nested_pipelines(op: &Op) -> Vec<&[Op]>` match table
with a trait method on `OpExec` defaulting to `Vec::new()`. Branching
ops (`DispatchByKind`, `MultiStageWindow`, `RollingWindow`) override.
The macro forwards `nested_pipelines` alongside `exec` and
`is_terminal`, so the validator can call `op.nested_pipelines()`
directly and the per-variant accessors disappear.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waralexrom waralexrom closed this May 8, 2026
@waralexrom waralexrom reopened this May 8, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @waralexrom's task in 2m 26s —— View job


Code Review: Replace SqlNode trait with Op pipeline

  • Gather context and review latest commit state
  • Review core Op infrastructure (op_enum, op_exec, op_ctx, node_processor)
  • Review individual Op implementations (all 19 ops)
  • Review factory and pipeline construction
  • Review integration points and deleted code
  • Check test coverage
  • Post comprehensive review

Overall Assessment

This is a well-executed structural refactor — looks good to merge. The replacement of Rc<dyn SqlNode> decorator chains with a data-driven Vec<Op> pipeline is architecturally sound. Net -430 lines, complete elimination of as_any/childs boilerplate, and structural validation that catches malformed pipelines at construction time rather than at render time. The migration is clean — zero dyn SqlNode references remain in the codebase.

I note that the last few commits (c6fb073, 6690d67, b68adfa, 19c5267) addressed the key concerns from the two prior reviews:

  • #[derive(Debug)] now on Op and all inner structs ✅
  • define_op_enum! macro eliminates parallel dispatch drift ✅
  • .unwrap() in time_shift_op.rs replaced with CubeError::internal
  • nested_pipelines moved into OpExec trait with default impl ✅

Architecture

The layering is clean and well-documented:

Layer File Responsibility
OpExec trait op_exec.rs exec + is_terminal + nested_pipelines (with defaults)
Op enum op_enum.rs Exhaustive dispatch via define_op_enum! — adding a new op is one line
OpCtx op_ctx.rs Render state with render_tail/render_pipeline/with_visitor/tail_as_node_processor
NodeProcessor node_processor.rs Rc-shareable validated pipeline carrier
SqlNodesFactory factory.rs Pipeline assembly via prepend-based builder

Remaining Suggestions

1. NodeProcessor::validatedOnceCell retries on failure (node_processor.rs:37-40)

If validate_pipeline returns Err, the OnceCell is never set, so every subsequent to_sql call re-runs validation. Since structural invalidity is always fatal, consider validating eagerly in new() — changing the return type to Result<Rc<Self>, CubeError> — which gives a stronger invariant: a NodeProcessor that exists is always valid. This also eliminates the OnceCell + validated field.

Not blocking — the current code is correct, just slightly wasteful on the failure path. Fix this →

2. geo_dimension_op.rs:31&vec![…] heap allocates unnecessarily

// Current (allocates a Vec on the heap just to borrow it):
ctx.templates.concat_strings(&vec![latitude_str, format!("','"), longitude_str])

// Better (stack-only):
ctx.templates.concat_strings(&[latitude_str, format!("','"), longitude_str])

Pre-existing from the old code, trivial fix. Fix this →

3. case_op.rs:127-129 — Hard error on non-dimension/measure symbols

CaseOp::exec returns Err for symbols that aren't dimensions or measures. This is safe today because the factory only places CaseOp in dimension/measure branches. But if the pipeline becomes more composable in the future, a defensive ctx.render_tail() fallthrough would be safer. Worth considering as the pipeline evolves.

4. masked_op.rs:51-56 — Double tail evaluation when mask filter is present

When a mask filter exists, render_tail() is called (line 51) to get original_sql, and then tail_as_node_processor() (line 56) clones the same tail into a new NodeProcessor for filter rendering. The tail pipeline is effectively built/evaluated twice. All ops are currently pure so this is harmless, but worth being aware of if any op gains side effects in the future.

Behavioral Correctness

I verified the factory's pipeline construction against the old code. The mapping is 1:1:

Pipeline Chain order Match?
Dimension TimeShift → CalendarTimeShift → TimeDimension → Paren → AutoPrefix → Case → Geo → EvaluateSymbol
Measure MultiStageRank → MultiStageWindow → Masked → Final → Ungrouped → Case → MeasureFilter → Paren → AutoPrefix → Masked → EvaluateSymbol
Root dispatch 4-way (dimension, time_dimension, measure, default) → DispatchByKindOp
OriginalSqlPreAggregation removal Handled by CubeRefEvaluator (unchanged)
CubeCalcGroups removal Clean — no remaining references
dyn SqlNode elimination Zero remaining references in the codebase

Security

No concerns. All changes are internal refactoring. SQL generation still flows through PlanSqlTemplates for quoting/escaping. No new external inputs.

Testing

  • 5 unit tests in op_enum.rs — structural validation edge cases (empty, non-terminal-only, terminal-in-middle, valid linear, recursive nested)
  • 91 test files in the tests/ directory exercise the full pipeline end-to-end
  • 17 auto-parenthesization tests verify ParenthesizeOp + visitor flag interaction
  • Codecov confirms all modified coverable lines are covered, project coverage at 83.47%

Verdict

Looks good to merge. The only actionable item is the OnceCell → eager validation in NodeProcessor::new (stronger invariant, removes a field). The &vec![] allocation and CaseOp defensiveness are low-priority follow-ups.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant