chore(tesseract): replace SqlNode trait with Op pipeline#10848
chore(tesseract): replace SqlNode trait with Op pipeline#10848waralexrom wants to merge 11 commits into
Conversation
Add an Op-based pipeline alongside the existing Rc<dyn SqlNode> decorator chain so SQL evaluator nodes can be migrated to a data-driven, inspectable representation incrementally. - Add `op/` submodule with `OpExec` trait, `OpCtx`, `Op` enum, exhaustive dispatch and `OpPipelineSqlNode` bridge so a `Vec<Op>` plugs into the legacy tree wherever a `Rc<dyn SqlNode>` is expected. - Add `LegacySqlNodeOp` escape hatch to wrap not-yet-migrated nodes during the transition; it goes away once all nodes are migrated. - Migrate `EvaluateSqlNode` and `ParenthesizeSqlNode` to Ops; remove the legacy files. - Add `Op::evaluate_symbol()`, `Op::parenthesize()`, `Op::legacy()` free constructors to keep call sites readable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rReferences to Op Batch 1 of the SqlNode → Op migration: simple decorators with no multi-pipeline branching. Each legacy node becomes an Op variant with its `OpExec` impl living in its own file under `op/`. - `AutoPrefixOp`, `GeoDimensionOp`, `MeasureFilterOp`, `RenderReferencesOp` with matching `Op::auto_prefix(...)`, `Op::geo_dimension()`, `Op::measure_filter()`, `Op::render_references(...)` constructors. - Factory call sites switch to small `op_*` helpers that wrap the remaining legacy SqlNode tail via `Op::legacy(...)`. - `render_references.rs` keeps the data types (`RenderReferences`, `RenderReferencesType`, `RawReferenceValue`) used elsewhere; the `SqlNode` impl is gone. - Drop `auto_prefix.rs`, `geo_dimension.rs`, `measure_filter.rs`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batch 2 of the SqlNode → Op migration: masking, case definitions and top-level kind dispatch. - `MaskedOp` keeps the same business logic (substitute `mask:` SQL when the user lacks access; CASE-guard with the optional mask filter). Filter rendering uses a new `OpCtx::tail_as_sql_node()` to materialize the unmasked tail as a `SqlNode`, so member references inside the filter resolve through it without recursing back through MaskedOp. - `CaseOp` mirrors the legacy `case:` / `case_switch:` rendering with the same shortcut behaviour for degenerate switches. - `DispatchByKindOp` replaces `RootSqlNode`: dimension / time dimension / measure / default each carry their own `Vec<Op>` pipeline, and the op picks one based on the symbol kind. - `Op` and all variants get `#[derive(Clone)]` to support `tail_as_sql_node`. - `OpCtx::render_pipeline` lifetime relaxed: a fresh sub-pipeline can live for any lifetime ≤ outer ctx, so kind-dispatch branches can be borrowed straight from the holding op. - Drop `case.rs`, `masked.rs`, `root_processor.rs`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batch 3 of the SqlNode → Op migration: the four wrappers around the final measure aggregation step. - `FinalMeasureOp` applies the regular aggregate (`SUM`/`AVG`/`COUNT DISTINCT`/HLL state init) with the right form for joins that produce multiplied rows. - `FinalPreAggregationMeasureOp` swaps the aggregate for the equivalent rollup over a pre-aggregation column when one is available (`sum(state)`, `merge(state)`, etc.). - `UngroupedMeasureOp` and `UngroupedQueryFinalMeasureOp` cover the two ungrouped flavors — the former passes the expression through at the measure level, the latter wraps count-likes in `CASE WHEN <expr> IS NOT NULL THEN 1 END` for downstream summation. - Factory `final_measure_node_processor` switches to op_* helpers. - Drop `final_measure.rs`, `final_pre_aggregation_measure.rs`, `ungroupped_measure.rs`, `ungroupped_query_final_measure.rs`. - Update one stale class-name reference in a test comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final batch of the SqlNode → Op migration, covering the time-related wrappers and the multi-pipeline measure ops. - `TimeDimensionOp`, `TimeShiftOp`, `CalendarTimeShiftOp` handle timezone lifting, multi-stage interval shifts and calendar-cube shifts. - `MultiStageRankOp` renders multi-stage rank measures as `RANK() OVER (PARTITION BY … ORDER BY …)`; non-rank measures fall through to the rest of the pipeline. - `MultiStageWindowOp` and `RollingWindowOp` are the first ops with multiple sub-pipelines: each branch is its own `Vec<Op>` carried in the op variant, picked at exec time. This is the structural branching shape the Op model was introduced for — kind-dispatch and multi-input wrappers now hold their branches as data. - Factory call sites switch to op_* helpers; legacy node imports drop out entirely. - Drop `time_dimension.rs`, `time_shift.rs`, `calendar_time_shift.rs`, `multi_stage_rank.rs`, `multi_stage_window.rs`, `rolling_window.rs`. After this batch every former `*SqlNode` has an Op counterpart; only the migration scaffolding (`OpPipelineSqlNode`, `Op::LegacySqlNode`) and the `SqlNode` trait itself remain to be removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce an `is_terminal` bit on `OpExec` plus a recursive
`Op::validate_pipeline` that enforces a simple invariant: a pipeline
must be non-empty, end with exactly one terminal op, and contain no
terminals before that. Sub-pipelines carried by branching ops
(`DispatchByKind`, `MultiStageWindow`, `RollingWindow`) are validated
recursively.
Validation runs at use time in `OpPipelineSqlNode::to_sql` and the
result is cached via `OnceCell` so a pipeline reused across many
renders pays the cost once. A bad plan surfaces as `CubeError::internal`
through the regular error path — no panics. Running at use time also
covers future flows where pipelines are mutated (e.g. ops removed at
runtime instead of toggled via context flags).
- `EvaluateSymbol`, `DispatchByKind`, `MultiStageWindow`,
`RollingWindow`, `LegacySqlNode` mark themselves terminal.
- The three branching ops expose a `pub(super) nested_pipelines()`
accessor used by the validator.
- Add a doc note on `MultiStageWindowOp` and `RollingWindowOp` matching
the one already on `DispatchByKindOp` ("Discards the tail — each
branch is a self-contained pipeline").
- 5 inline tests in `op_enum.rs` cover the full grammar (empty,
missing terminal, terminal in the middle, valid linear, nested
invalid).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final step of the SqlNode → Op migration. With every legacy node now backed by an Op variant, the trait machinery and bridges go away. - `NodeProcessor` (renamed from `OpPipelineSqlNode`) becomes the public carrier of a `Vec<Op>`: a concrete `Rc`-shareable struct with an inherent `to_sql` method, no trait, no `dyn`. Validation still runs once on first render via `OnceCell`. - Bulk-replace `Rc<dyn SqlNode>` with `Rc<NodeProcessor>` across the planner — visitor, `MemberSqlContext`, `SqlCall::eval`, `Filter::to_sql`, `cube_ref_evaluator`, filter operators, the physical_plan_builder, and tests. - Factory rewritten on flat `Vec<Op>`: helpers `op_*` prepend their op in front of the inbound pipeline, `default_node_processor` wraps the final vector in `Rc<NodeProcessor>::new(...)` once. No more nested `OpPipelineSqlNode → Op::legacy(child)` towers. - Drop `SqlNode` trait (`sql_node.rs`), `Op::LegacySqlNode` variant + `LegacySqlNodeOp`, the `OpPipelineSqlNode` bridge, and orphaned `cube_calc_groups.rs` / `original_sql_pre_aggregation.rs`. - Rename `OpCtx::legacy_node_processor` → `node_processor` and `tail_as_sql_node` → `tail_as_node_processor`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframe the public docstrings around what each item is, not how it sits relative to call sites or migration history. - `OpCtx` and its methods (`render_pipeline`, `tail_as_node_processor`, `with_visitor`) drop "used by ops that …" prose; the new wording describes the state and behavior directly. - Add missing docs on the `OpExec` trait, the `SqlNodesFactory` struct and its `default_node_processor` entry point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @waralexrom's task in 3m 43s —— View job Code Review: Replace SqlNode trait with Op pipeline
Overall AssessmentThis is a well-executed structural refactor. Replacing Strengths
Issues & SuggestionsI've left 6 inline comments on specific files. The highlights:
Behavioral CorrectnessI verified the following behavioral equivalences against the old code:
SecurityNo security concerns. The changes are purely internal refactoring of SQL rendering pipeline composition. All SQL generation still flows through Minor Nits (Non-blocking)
VerdictLooks good overall. The only item I'd suggest fixing before merge is the |
- `MultiStageWindowOp::exec`: rewrite the gating condition from `!(a && !b)` to `!a || b` (`clippy::nonminimal_bool`). - `Op` enum doc: rephrase the "Adding a new op = …" sentence so no doc-comment line starts with `+` and trips `clippy::doc_lazy_continuation`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from PR review. - Derive `Debug` (and `Clone`) for every Op variant, the `Op` enum, and `NodeProcessor`. `CalendarDimensionTimeShift` and the `RenderReferences*` types pick up `Debug` so the propagation works. Plan dumps and snapshot diffs now produce useful output. - Replace the two parallel match tables in `OpExec for Op` with a single `define_op_enum!` macro. Variants are declared once at the bottom of the file; the macro expands the enum, the derives, and the forwarded `exec` / `is_terminal` impls. Adding a new op stays a one-line change and the two dispatch arms can no longer drift. - `TimeShiftOp::exec` no longer unwraps `shift.interval`; a missing interval produces a `CubeError::internal` instead of a panic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @waralexrom's task in 4m 36s —— View job Code Review: Replace SqlNode trait with Op pipeline
Overall AssessmentThis is a well-executed structural refactor — looks good to merge. The replacement of ArchitectureThe new design has clean layering:
The Behavioral CorrectnessI verified the factory's pipeline construction against the old code. The mapping is 1:1:
Strengths
Issues & SuggestionsI left 7 inline comments. Summary:
SecurityNo security concerns. All changes are purely internal refactoring of the SQL rendering pipeline composition. SQL generation still flows through Testing
VerdictLooks good. The only item I'd suggest fixing before merge is the |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10848 +/- ##
=======================================
Coverage 78.84% 78.84%
=======================================
Files 470 470
Lines 92274 92274
Branches 3433 3433
=======================================
+ Hits 72751 72755 +4
+ Misses 19020 19016 -4
Partials 503 503
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replace the free `nested_pipelines(op: &Op) -> Vec<&[Op]>` match table with a trait method on `OpExec` defaulting to `Vec::new()`. Branching ops (`DispatchByKind`, `MultiStageWindow`, `RollingWindow`) override. The macro forwards `nested_pipelines` alongside `exec` and `is_terminal`, so the validator can call `op.nested_pipelines()` directly and the per-variant accessors disappear. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @waralexrom's task in 2m 26s —— View job Code Review: Replace SqlNode trait with Op pipeline
Overall AssessmentThis is a well-executed structural refactor — looks good to merge. The replacement of I note that the last few commits (
ArchitectureThe layering is clean and well-documented:
Remaining Suggestions1. If Not blocking — the current code is correct, just slightly wasteful on the failure path. Fix this → 2. // Current (allocates a Vec on the heap just to borrow it):
ctx.templates.concat_strings(&vec![latitude_str, format!("','"), longitude_str])
// Better (stack-only):
ctx.templates.concat_strings(&[latitude_str, format!("','"), longitude_str])Pre-existing from the old code, trivial fix. Fix this → 3.
4. When a mask filter exists, Behavioral CorrectnessI verified the factory's pipeline construction against the old code. The mapping is 1:1:
SecurityNo concerns. All changes are internal refactoring. SQL generation still flows through Testing
VerdictLooks good to merge. The only actionable item is the |
Summary
Replaces the
Rc<dyn SqlNode>decorator chain in the Tesseract SQL evaluator with a data-driven Op pipeline. Plans are nowVec<Op>carried by a concreteNodeProcessor; ops are exhaustive enum variants with per-variantOpExecimpls. The motivation is to make the pipeline editable at runtime (mutating aVec<Op>instead of threading magic flags throughSqlEvaluatorVisitor) and to enable structural validation, snapshot diffing and future optimization passes.