Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 80 additions & 10 deletions core/src/main/java/org/apache/calcite/plan/RelOptUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -2957,7 +2957,8 @@ public static boolean classifyFilters(
joinFields,
nTotalFields,
leftFields,
filter);
filter,
joinRel.getInput(0));

leftFilters.add(shiftedFilter);
}
Expand All @@ -2975,7 +2976,8 @@ public static boolean classifyFilters(
joinFields,
nTotalFields,
rightFields,
filter);
filter,
joinRel.getInput(1));
rightFilters.add(shiftedFilter);
}
filtersToRemove.add(filter);
Expand Down Expand Up @@ -3079,7 +3081,8 @@ public static boolean classifyFilters(
joinFields,
nTotalFields,
leftFields,
filter);
filter,
joinRel.getInput(0));

leftFilters.add(shiftedFilter);
}
Expand All @@ -3105,7 +3108,8 @@ public static boolean classifyFilters(
joinFields,
nTotalFields,
rightFields,
filter);
filter,
joinRel.getInput(1));
rightFilters.add(shiftedFilter);
}
filtersToRemove.add(filter);
Expand Down Expand Up @@ -3140,7 +3144,8 @@ private static RexNode shiftFilter(
List<RelDataTypeField> joinFields,
int nTotalFields,
List<RelDataTypeField> rightFields,
RexNode filter) {
RexNode filter,
RelNode child) {
int[] adjustments = new int[nTotalFields];
for (int i = start; i < end; i++) {
adjustments[i] = offset;
Expand All @@ -3150,7 +3155,9 @@ private static RexNode shiftFilter(
rexBuilder,
joinFields,
rightFields,
adjustments));
adjustments,
offset,
child));
}

/**
Expand Down Expand Up @@ -4752,6 +4759,17 @@ public ImmutableBitSet build() {
}
return super.visitCall(call);
}

@Override public Void visitSubQuery(RexSubQuery subQuery) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the purpose here is to obtain the indices of the free variables in the subquery. My concern is:
InputFinder analyzes the expressions of a RelNode, and there should be at most one CorrelationId that belongs to that RelNode (at least during the rule rewriting phase). Here, calling RelOptUtil.getVariablesUsed returns a set of CorrelationId, which may include CorrelationIds that do not belong to this RelNode (i.e., the free variables in the subquery come from an outer scope). While it may be difficult to construct such a case, InputFinder.analyze is a public method, and we don't know how users will call it in their systems.

If we want InputFinder to consider the indices of free variables in the subquery, perhaps we should add a field to record the variablesSet of the RelNode.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole purpose is to find correlationID right? a correlationID doesn't belong to that particular RelNode anyways, it belongs to whichever other subtree it's correlation variable is pointing to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid any misunderstanding, let me explain again. For example:
There is a Filter(condition=[subquery], variablesSet=[cor2]), and the subquery has two free variables: cor2.id and cor1.name (assume that cor1 belongs to an outer scope). If we use InputFinder to analyze the Filter condition, InputFinder should record the index of cor2.id and ignore cor1.name.

As I mentioned below, similar logic should also apply to RexInputConverter.visitSubQuery.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will come up with some unit test for this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified a scenario where nested subquery has 2 correlated variable which belongs to inner scope and outer scope.

exaple query:

SELECT e.ename, d.dname
FROM emp e
INNER JOIN dept d ON e.deptno = d.deptno --------- main query / outer scope
AND e.sal IN (
  SELECT e2.sal
  FROM emp e2     -------- outer subquery / inner scope
  WHERE e2.job IN (
    SELECT e3.job
    FROM emp e3   -------- inner subquery / inner most scope 
    WHERE e3.sal = e.sal 
      AND e3.sal = e2.sal
  )
) where e.sal > 1000 and d.dname = 'SALES'

plan

LogicalProject(ENAME=[$1], DNAME=[$9])
  LogicalFilter(condition=[AND(>(CAST($5):DECIMAL(12, 2), 1000.00), =($9, 'SALES'))])
    LogicalJoin(condition=[AND(=($7, $8), IN($5, {
LogicalProject(SAL=[$5])
  LogicalFilter(condition=[IN($2, {
LogicalProject(JOB=[$2])
  LogicalFilter(condition=[AND(=($5, $cor0.SAL), =($5, $cor1.SAL))])
    LogicalTableScan(table=[[scott, EMP]])
})], variablesSet=[[$cor1]])
    LogicalTableScan(table=[[scott, EMP]])
}))], joinType=[inner], variablesSet=[[$cor0]])
      LogicalTableScan(table=[[scott, EMP]])
      LogicalTableScan(table=[[scott, DEPT]])

when FilterJoinRule is called, it's called on main query scope's join which has variablesSet=[[$cor0]]
so when InputFinder is used on the condition of join which contains subquery the method RelOptUtil.getVariablesUsed will return $cor0 and not all variables; this is already handled so we will not get both variables here. Hence InputFinder and RexInputConverter will get correct variableSet.

and if we are concerned about outer subquery getting variableSet of all correlationId then also we shouldn't be worries because it will not be queried directly unless we decorrelate all nested subqueries.

final Set<CorrelationId> variablesSet = RelOptUtil.getVariablesUsed(subQuery.rel);
for (CorrelationId id : variablesSet) {
ImmutableBitSet requiredColumns = RelOptUtil.correlationColumns(id, subQuery.rel);
for (int index : requiredColumns) {
bitBuilder.set(index);
}
}
return super.visitSubQuery(subQuery);
}
}

/**
Expand All @@ -4766,6 +4784,8 @@ public static class RexInputConverter extends RexShuttle {
private final @Nullable List<RelDataTypeField> rightDestFields;
private final int nLeftDestFields;
private final int[] adjustments;
private final int offset;
private final @Nullable RelNode correlateVariableChild;

/**
* Creates a RexInputConverter.
Expand All @@ -4784,14 +4804,23 @@ public static class RexInputConverter extends RexShuttle {
* @param rightDestFields in the case where the destination is a join,
* these are the fields from the right join input
* @param adjustments the amount to adjust each field by
* @param offset the amount to shift field accesses by when
* rewriting correlated subqueries
* @param correlateVariableChild the child relation providing the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment format here is strange.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name is big here, that's why it's going into it's doc. any suggestions on formatting?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'll take a look at other Calcite code later to see if there are any good solutions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps changing to a shorter, more concise variable name would solve the problem. 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will get back on this tomorrow

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a shorter variable name to make formatting of comments nice is not a good reason.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, I'll look into whether there's a better way to handle this tomorrow.

* correlated variable; if non-null, subqueries
* referencing a correlation variable will have
* their field accesses shifted by {@code offset}
* relative to this child
*/
private RexInputConverter(
RexBuilder rexBuilder,
@Nullable List<RelDataTypeField> srcFields,
@Nullable List<RelDataTypeField> destFields,
@Nullable List<RelDataTypeField> leftDestFields,
@Nullable List<RelDataTypeField> rightDestFields,
int[] adjustments) {
int[] adjustments,
int offset,
@Nullable RelNode correlateVariableChild) {
this.rexBuilder = rexBuilder;
this.srcFields = srcFields;
this.destFields = destFields;
Expand All @@ -4804,6 +4833,8 @@ private RexInputConverter(
assert destFields == null;
nLeftDestFields = leftDestFields.size();
}
this.offset = offset;
this.correlateVariableChild = correlateVariableChild;
}

public RexInputConverter(
Expand All @@ -4818,22 +4849,61 @@ public RexInputConverter(
null,
leftDestFields,
rightDestFields,
adjustments);
adjustments,
0,
null);
}

public RexInputConverter(
RexBuilder rexBuilder,
@Nullable List<RelDataTypeField> srcFields,
@Nullable List<RelDataTypeField> destFields,
int[] adjustments) {
this(rexBuilder, srcFields, destFields, null, null, adjustments);
this(rexBuilder, srcFields, destFields, null, null, adjustments, 0, null);
}

public RexInputConverter(
RexBuilder rexBuilder,
@Nullable List<RelDataTypeField> srcFields,
int[] adjustments) {
this(rexBuilder, srcFields, null, null, null, adjustments);
this(rexBuilder, srcFields, null, null, null, adjustments, 0, null);
}

public RexInputConverter(
RexBuilder rexBuilder,
@Nullable List<RelDataTypeField> srcFields,
@Nullable List<RelDataTypeField> destFields,
int[] adjustments,
int offset,
RelNode child) {
this(rexBuilder, srcFields, destFields, null, null, adjustments, offset, child);
}

@Override public RexNode visitSubQuery(RexSubQuery subQuery) {
boolean[] update = {false};
List<RexNode> clonedOperands = visitList(subQuery.operands, update);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean[] update = {false}; Is it necessary?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,

visitList(
      List<? extends RexNode> exprs, boolean @Nullable [] update

accepts list of boolean and updates 0th index to true if updated.
pattern is same everywhere

if (update[0]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there might be an issue with the EXISTS subquery, but I'm not entirely sure. Could you add a similar test?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXISTS fails similar to IN clause, nice catch! thanks for this. will work on it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and added test

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides update[0], is there a better representation? Could you explain why it must be update[0]?

subQuery = subQuery.clone(subQuery.getType(), clonedOperands);
}
final Set<CorrelationId> variablesSet =
RelOptUtil.getVariablesUsed(subQuery.rel);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the concern at InputFinder.visitSubQuery, there might be a CorrelationId belonging to an outer scope here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correlation ID belongs to outer scope, that's the point of adding it. please check the test RelDecorrelatorTest#testCorrelatedVariableIndexForExistsClause()

if (!variablesSet.isEmpty() && correlateVariableChild != null) {
for (CorrelationId id : variablesSet) {
RelNode newSubQueryRel =
subQuery.rel.accept(new RelHomogeneousShuttle() {
@Override public RelNode visit(RelNode other) {
RelNode node =
RexUtil.shiftFieldAccess(rexBuilder, other, id,
correlateVariableChild, offset);
return super.visit(node);
}
});
if (newSubQueryRel != subQuery.rel) {
subQuery = subQuery.clone(newSubQueryRel);
}
}
}
return subQuery;
}

@Override public RexNode visitInputRef(RexInputRef var) {
Expand Down
24 changes: 24 additions & 0 deletions core/src/main/java/org/apache/calcite/rel/core/Join.java
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,30 @@ public static RelDataType createJoinType(
public abstract Join copy(RelTraitSet traitSet, RexNode conditionExpr,
RelNode left, RelNode right, JoinRelType joinType, boolean semiJoinDone);

/**
* Creates a copy of this join, overriding condition, system fields and
* inputs.
*
* <p>General contract as {@link RelNode#copy}.
*
* @param traitSet Traits
* @param conditionExpr Condition
* @param left Left input
* @param right Right input
* @param joinType Join type
* @param semiJoinDone Whether this join has been translated to a
* semi-join
* @param variablesSet Set of variables that are set by the
* LHS and used by the RHS and are not available to
* nodes above this LogicalJoin in the tree
* @return Copy of this join
*/
public Join copy(RelTraitSet traitSet, RexNode conditionExpr,
RelNode left, RelNode right, JoinRelType joinType, boolean semiJoinDone,
Set<CorrelationId> variablesSet) {
return copy(traitSet, conditionExpr, left, right, joinType, semiJoinDone);
}

/**
* Analyzes the join condition.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,14 @@ public static LogicalJoin create(RelNode left, RelNode right, List<RelHint> hint
variablesSet, joinType, semiJoinDone, systemFieldList);
}

@Override public LogicalJoin copy(RelTraitSet traitSet, RexNode conditionExpr, RelNode left,
RelNode right, JoinRelType joinType, boolean semiJoinDone, Set<CorrelationId> variablesSet) {
assert traitSet.containsIfApplicable(Convention.NONE);
return new LogicalJoin(getCluster(),
getCluster().traitSetOf(Convention.NONE), hints, left, right, conditionExpr,
variablesSet, joinType, semiJoinDone, systemFieldList);
}

@Override public RelNode accept(RelShuttle shuttle) {
return shuttle.visit(this);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import org.apache.calcite.plan.RelOptUtil;
import org.apache.calcite.plan.RelRule;
import org.apache.calcite.rel.RelNode;
import org.apache.calcite.rel.core.CorrelationId;
import org.apache.calcite.rel.core.Filter;
import org.apache.calcite.rel.core.Join;
import org.apache.calcite.rel.core.JoinRelType;
Expand All @@ -29,7 +30,9 @@
import org.apache.calcite.rex.RexCall;
import org.apache.calcite.rex.RexInputRef;
import org.apache.calcite.rex.RexNode;
import org.apache.calcite.rex.RexSubQuery;
import org.apache.calcite.rex.RexUtil;
import org.apache.calcite.rex.RexVisitorImpl;
import org.apache.calcite.sql.SqlKind;
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
import org.apache.calcite.tools.RelBuilder;
Expand Down Expand Up @@ -198,14 +201,43 @@ protected void perform(RelOptRuleCall call, @Nullable Filter filter,
return;
}

Set<CorrelationId> leftVariablesSet = new LinkedHashSet<>();
Set<CorrelationId> rightVariablesSet = new LinkedHashSet<>();

for (RexNode condition : leftFilters) {
condition.accept(new RexVisitorImpl<Void>(true) {
@Override public Void visitSubQuery(RexSubQuery subQuery) {
leftVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
return super.visitSubQuery(subQuery);
}
});
}

for (RexNode condition : rightFilters) {
condition.accept(new RexVisitorImpl<Void>(true) {
@Override public Void visitSubQuery(RexSubQuery subQuery) {
rightVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
return super.visitSubQuery(subQuery);
}
});
}

ImmutableSet.Builder<CorrelationId> newJoinCorrelationIds = ImmutableSet.builder();
for (CorrelationId correlationId : join.getVariablesSet()) {
if (!leftVariablesSet.contains(correlationId)
&& !rightVariablesSet.contains(correlationId)) {
newJoinCorrelationIds.add(correlationId);
}
}

// create Filters on top of the children if any filters were
// pushed to them
final RexBuilder rexBuilder = join.getCluster().getRexBuilder();
final RelBuilder relBuilder = call.builder();
final RelNode leftRel =
relBuilder.push(join.getLeft()).filter(leftFilters).build();
relBuilder.push(join.getLeft()).filter(leftVariablesSet, leftFilters).build();
final RelNode rightRel =
relBuilder.push(join.getRight()).filter(rightFilters).build();
relBuilder.push(join.getRight()).filter(rightVariablesSet, rightFilters).build();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the initial ON condition contains two correlated subqueries, one that cannot be pushed down and the other is pushed down to the right, could the following situation occur?

Join(variablesSet=cor1)
/     \
       Filter(variablesSet=cor1)



// after removing subquery and applying other rules
Correlate(cor1)
/     \
       ......
         \
        Correlate(cor1)

Two Correlate with the same id, with one appearing in the RHS of the other — this may cause serious decorrelation errors.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated code to remove pushed down variable set from join's variable set

Copy link
Copy Markdown
Author

@yashlimbad yashlimbad Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some tests are failing, investigating them. will get back on it tomorrow

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and pushed


// create the new join node referencing the new children and
// containing its new join filters (if there are any)
Expand Down Expand Up @@ -233,7 +265,8 @@ protected void perform(RelOptRuleCall call, @Nullable Filter filter,
leftRel,
rightRel,
joinType,
join.isSemiJoinDone());
join.isSemiJoinDone(),
newJoinCorrelationIds.build());
call.getPlanner().onCopy(join, newJoinRel);
if (!leftFilters.isEmpty() && filter != null) {
call.getPlanner().onCopy(filter, leftRel);
Expand Down
Loading
Loading