[refactor](mv) Refactor MV rewrite StructInfo lookup by relation ids#62981
[refactor](mv) Refactor MV rewrite StructInfo lookup by relation ids#62981foxtail463 wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
minimal reproducible case:DROP DATABASE IF EXISTS mv_id_conflict_demo;
CREATE DATABASE mv_id_conflict_demo;
USE mv_id_conflict_demo;
SET enable_nereids_planner = true;
SET enable_fallback_to_original_planner = false;
SET enable_materialized_view_rewrite = true;
SET enable_materialized_view_nest_rewrite = true;
SET enable_nereids_timeout = false;
SET materialized_view_rewrite_duration_threshold_ms = 1800000;
CREATE TABLE fact_src (
dt DATE NOT NULL,
k VARCHAR(32) NOT NULL,
is_dyn VARCHAR(8),
sku_type VARCHAR(8)
) DUPLICATE KEY(dt, k)
PARTITION BY RANGE(dt) (PARTITION p1 VALUES LESS THAN ('2026-02-05'))
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
CREATE TABLE dim_full (
dt DATE NOT NULL,
k VARCHAR(32) NOT NULL,
sku_type VARCHAR(8),
is_dyn VARCHAR(8),
bu VARCHAR(32),
mode_flag VARCHAR(8),
double_flag VARCHAR(8)
) UNIQUE KEY(dt, k, sku_type, is_dyn)
PARTITION BY RANGE(dt) (PARTITION p1 VALUES LESS THAN ('2026-02-05'))
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
CREATE VIEW v_dim_full_non_double AS
SELECT dt, k, mode_flag, sku_type FROM dim_full WHERE double_flag = '0';
INSERT INTO fact_src VALUES ('2026-02-04', 'K1', '0', '1');
INSERT INTO dim_full VALUES ('2026-02-04', 'K1', '1', '0', 'D1', 'M', '0');
DROP MATERIALIZED VIEW IF EXISTS mv_fact;
CREATE MATERIALIZED VIEW mv_fact
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT dt, k, is_dyn, sku_type
FROM fact_src
WHERE sku_type = '1';
DROP MATERIALIZED VIEW IF EXISTS mv_dim_full;
CREATE MATERIALIZED VIEW mv_dim_full
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT dt, k, bu, is_dyn, sku_type
FROM dim_full;
DROP MATERIALIZED VIEW IF EXISTS mv_dim_full_view_non_double;
CREATE MATERIALIZED VIEW mv_dim_full_view_non_double
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT dt, k, mode_flag, sku_type
FROM v_dim_full_non_double;
DROP MATERIALIZED VIEW IF EXISTS mv_target;
CREATE MATERIALIZED VIEW mv_target
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT
t.dt,
t.k,
d0.bu AS out_bu,
d1.mode_flag AS out_mode
FROM mv_fact t
LEFT JOIN mv_dim_full d0
ON t.dt = d0.dt
AND t.k = d0.k
AND t.sku_type = d0.sku_type
AND t.is_dyn = d0.is_dyn
LEFT JOIN mv_dim_full_view_non_double d1
ON t.dt = d1.dt
AND t.k = d1.k
AND t.sku_type = d1.sku_type;
REFRESH MATERIALIZED VIEW mv_fact COMPLETE;
REFRESH MATERIALIZED VIEW mv_dim_full COMPLETE;
REFRESH MATERIALIZED VIEW mv_dim_full_view_non_double COMPLETE;
--等几秒
REFRESH MATERIALIZED VIEW mv_target COMPLETE;
EXPLAIN
SELECT
t.dt,
t.k,
d0.bu AS out_bu,
d1.mode_flag AS out_mode
FROM fact_src t
LEFT JOIN dim_full d0
ON t.dt = d0.dt
AND t.k = d0.k
AND t.sku_type = d0.sku_type
AND t.is_dyn = d0.is_dyn
LEFT JOIN v_dim_full_non_double d1
ON t.dt = d1.dt
AND t.k = d1.k
AND t.sku_type = d1.sku_type
WHERE t.dt = '2026-02-04'
AND t.sku_type = '1'
ORDER BY t.k; |
d1b5a73 to
a620789
Compare
|
/review |
Performance EvaluationThis benchmark checks whether the new StructInfo candidate lookup adds visible MV rewrite overhead. It compares the current patch The benchmark SQL models a nested MV rewrite case with base tables, a view, child MVs, and a parent MV. The query starts from base tables and a view, while the target MV is defined over child MVs. This shape exercises the nested MV rewrite path and the StructInfo candidate lookup changed by this PR. Example:
The benchmark has three scales:
All three scales assert that the target MV is chosen. End-to-end EXPLAIN benchmark
|
|
run buildall |
What problem does this PR solve?
Problem Summary:
Nested MV rewrite needs to distinguish two different identities during fuzzy
StructInfo collection:
In this shape, child rewrite can first introduce MV scan relations into memo.
Then the parent group should be able to build a candidate plan from those MV
scan relations and match mv_target.
The old StructInfo candidate path used the table/common-table-id based cache key
in StructInfoMap's candidate map to organize memo candidates. That key only
describes the table family covered by one MV definition; it is a search-space
key, not the identity of a concrete candidate. The exact candidate identity is
relationIdSet, which describes the relations contained by one memo candidate plan
tree.
In the example above, the rewritten scan candidate for mv_dim_full and the
rewritten scan candidate for mv_dim_full_view_non_double can fall into the same
table/common-table-id cache key while representing different relationIdSet
values. If one candidate overwrites or is reused as the other, the parent
mv_target candidate is assembled with the wrong child relation, so the final
target MV rewrite becomes path-sensitive and may fail.
This refactor makes the identity boundary explicit:
the inner key
nested MV scan relations
This keeps base-table, view-derived, and rewritten MV-scan candidates coexisting
under the same coarse table family without overwriting each other.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)