[FEATURE] Add Union command in PPL by srikanthpadakanti · Pull Request #5240 · opensearch-project/sql

srikanthpadakanti · 2026-03-17T16:33:26Z

Description

Add union command to PPL that implements SQL-style UNION ALL semantics with Calcite-based type coercion. The command supports combining multiple datasets (indices, patterns, aliases, or subsearches) with automatic schema merging and missing fields are filled with NULL, compatible types are coerced to a common supertype (e.g., int+float --> float), and incompatible types fall back to string. Works both as a first command and mid-pipeline, where the upstream result set is implicitly included as the first dataset.

Related Issues

Resolves #5110
#5110

Check List

[ X ] New functionality includes testing.
[ X ] New functionality has been documented.
[ X ] New functionality has javadoc added.
[ X ] New functionality has a user manual doc added.
[ X ] New PPL command checklist all confirmed.
API changes companion pull request created.
[ X ] Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-17T16:35:36Z

PR Reviewer Guide 🔍

(Review updated until commit `9ec4314`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: AST, Parser, and Anonymizer support for Union command Relevant files: core/src/main/java/org/opensearch/sql/ast/tree/Union.java core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java core/src/main/java/org/opensearch/sql/analysis/Analyzer.java ppl/src/main/antlr/OpenSearchPPLLexer.g4 ppl/src/main/antlr/OpenSearchPPLParser.g4 ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java Sub-PR theme: Calcite execution engine and integration tests for Union command Relevant files: core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLUnionTest.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteUnionCommandIT.java integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java integ-test/src/test/resources/expectedOutput/calcite/explain_union.yaml integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_union.yaml Sub-PR theme: Documentation for Union command Relevant files: docs/user/ppl/cmd/union.md docs/user/ppl/index.md docs/category.json
⚡ Recommended focus areas for review Field Name Collision In `buildUnifiedSchemaForUnion`, the schema is built by iterating over all inputs and adding fields only if not already seen (by name). However, the type stored in `seenFields` is from the first input that introduces the field, not the coerced common type. If `coerceUnionTypes` has already aligned types, this should be fine, but if a field appears in only one input (missing in others), the NULL literal type is derived from the first-seen type. This could cause subtle issues if the coercion step and the schema-building step are not perfectly aligned. private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { if (!seenFields.containsKey(field.getName())) { schema.add(new SchemaField(field.getName(), field.getType())); seenFields.put(field.getName(), field.getType()); } } } return schema; } FLOAT/DECIMAL Rank In `getNumericTypeRankForUnion`, DECIMAL has rank 5 while FLOAT has rank 7 and DOUBLE has rank 8. This means INTEGER+FLOAT coerces to FLOAT (correct), but DECIMAL+FLOAT coerces to FLOAT, and DECIMAL+DOUBLE coerces to DOUBLE. DECIMAL is a fixed-precision type and widening it to FLOAT/DOUBLE can cause precision loss. This ranking may produce unexpected results for users combining DECIMAL fields with floating-point fields. private static int getNumericTypeRankForUnion(SqlTypeName typeName) { return switch (typeName) { case TINYINT -> 1; case SMALLINT -> 2; case INTEGER -> 3; case BIGINT -> 4; case DECIMAL -> 5; case REAL -> 6; case FLOAT -> 7; case DOUBLE -> 8; default -> 0; }; } Single Dataset Validation The check `if (inputNodes.size() < 2)` in `visitUnion` throws an `IllegalArgumentException` at planning time. However, when union is used mid-pipeline, the upstream node is attached via `Union.attach()` which prepends it to the datasets list. The validation in `visitUnionCommand` in `AstBuilder` allows 1+ datasets (for mid-pipeline use). It should be verified that the attach mechanism always results in at least 2 nodes reaching `visitUnion`, otherwise the error message may be confusing since it says "Provided: 1" but the user specified a valid mid-pipeline union with one explicit dataset. if (inputNodes.size() < 2) { throw new IllegalArgumentException( "Union command requires at least two datasets. Provided: " + inputNodes.size()); } Regex Keyword List The `anonymizeSubsearchCommand` method builds a dynamic regex keyword exclusion list that includes the `commandName` parameter. For `union`, this adds `union` to the exclusion list, but the list is constructed via string concatenation without escaping. If a future command name contains regex special characters, this could break the regex. Additionally, the keyword list is rebuilt on every call; consider extracting it as a constant or computing it once. private String anonymizeSubsearchCommand(String commandName, List<UnresolvedPlan> subsearches) { String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" + commandName + "\|search\|table\|identifier\|\\\\\\"; List<String> anonymizedSubsearches = new ArrayList<>(); for (UnresolvedPlan subsearch : subsearches) { String anonymizedSubsearch = anonymizeData(subsearch); anonymizedSubsearch = "search " + anonymizedSubsearch; anonymizedSubsearch = anonymizedSubsearch .replaceAll("\\bsource=\\w+", "source=table") .replaceAll("\\b(?!" + keywords + ")\\w+(?=\\s[<>=!])", "identifier") .replaceAll("\\b(?!" + keywords + ")\\w+(?=\\s,)", "identifier") .replaceAll("fields \\+\\s\\b(?!" + keywords + ")\\w+", "fields + identifier") .replaceAll( "fields \\+\\sidentifier,\\s\\b(?!" + keywords + ")\\w+", "fields + identifier,identifier");

github-actions · 2026-03-17T16:36:31Z

PR Code Suggestions ✨

Latest suggestions up to 9ec4314

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Include all command names in anonymizer keyword exclusion list The `keywords` regex is built by concatenating `commandName` directly into a regex alternation. If `commandName` contains regex special characters (unlikely for current values like "union" or "multisearch", but possible for future commands), this could break the regex. Additionally, the keyword list does not include `union` when anonymizing `multisearch` commands and vice versa, which could cause command names to be incorrectly replaced with `identifier`. Both command names should always be in the exclusion list. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [790-794] private String anonymizeSubsearchCommand(String commandName, List<UnresolvedPlan> subsearches) { String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" - + commandName + + "union\|multisearch" + "\|search\|table\|identifier\|\\\\\\*"; Suggestion importance[1-10]: 5 __ Why: This is a valid bug — when anonymizing a `union` command, the keyword `multisearch` is not in the exclusion list (and vice versa), which could cause command names appearing in subsearch content to be incorrectly replaced with `identifier`. The fix is straightforward and correct, hardcoding both command names instead of dynamically inserting only the current one.	Low
	Validate dataset count before processing inputs When `union` is used mid-pipeline (e.g., `search source=X \| union [...]`), the upstream plan is attached via `attach()` and becomes part of `node.getDatasets()`. However, if only one explicit dataset is provided in the grammar (which is allowed by the parser), `inputNodes` will have size 1 after the loop, and the error is thrown after already consuming the relBuilder state. The size check should happen before processing datasets, using the total expected count (datasets + possible implicit upstream), or the grammar should enforce at least 2 datasets at parse time. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2472-2481] +// Validate before processing to avoid partial relBuilder state corruption +// When used mid-pipeline, attach() prepends the upstream, so datasets.size() >= 1 is valid +// but we still need at least 2 total inputs for a union +if (node.getDatasets().size() < 2) { + throw new IllegalArgumentException( + "Union command requires at least two datasets. Provided: " + node.getDatasets().size()); +} + for (UnresolvedPlan dataset : node.getDatasets()) { UnresolvedPlan prunedDataset = dataset.accept(new EmptySourcePropagateVisitor(), null); prunedDataset.accept(this, context); inputNodes.add(context.relBuilder.build()); } -if (inputNodes.size() < 2) { - throw new IllegalArgumentException( - "Union command requires at least two datasets. Provided: " + inputNodes.size()); -} - Suggestion importance[1-10]: 4 __ Why: Moving the validation before processing avoids partial `relBuilder` state corruption, which is a valid concern. However, the mid-pipeline case (where `attach()` prepends the upstream) means `node.getDatasets()` already contains all datasets including the implicit upstream, so the check at `inputNodes.size() < 2` after the loop is functionally equivalent. The suggestion has merit but the impact is limited.	Low
	Remove redundant constructor annotation The class has both `@RequiredArgsConstructor` and `@AllArgsConstructor`. Since `datasets` is `final` and `maxout` is not, `@RequiredArgsConstructor` generates a constructor with only `datasets`, while `@AllArgsConstructor` generates one with both. However, the `attach` method calls `new Union(newDatasets, maxout)` which uses the all-args constructor. If `maxout` is null (the common case), the single-arg constructor `new` `Union(datasets)` would be used elsewhere, but having both annotations is redundant and can cause confusion. Consider removing `@RequiredArgsConstructor` since `@AllArgsConstructor` covers all cases, and use `new Union(newDatasets, maxout)` consistently. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; private Integer maxout; Suggestion importance[1-10]: 3 __ Why: Having both `@RequiredArgsConstructor` and `@AllArgsConstructor` is indeed redundant since `@RequiredArgsConstructor` generates a constructor with only `datasets` (the `final` field) while `@AllArgsConstructor` covers both fields. Removing `@RequiredArgsConstructor` simplifies the code, but this is a minor style improvement with low functional impact.	Low
Possible issue	Unified schema should reflect widest coerced type The unified schema uses the type from the first node that introduces a field, but after `coerceUnionTypes` runs, subsequent nodes may have had their types widened to a common supertype. The schema builder should use the coerced (target) type for each field rather than the first-seen type, otherwise `buildProjectionForUnion` may emit unnecessary CASTs or miss type mismatches. Consider deriving the unified schema from the coerced target type map instead of the first-seen field type. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); - Map<String, RelDataType> seenFields = new HashMap<>(); + Map<String, RelDataType> seenFields = new LinkedHashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + seenFields.put(name, field.getType()); + schema.add(new SchemaField(name, field.getType())); + } else { + // Update to the widest type seen (post-coercion nodes may have wider types) + RelDataType existing = seenFields.get(name); + if (!existing.equals(field.getType())) { + // Replace with the current node's type if it is wider (coercion already applied) + seenFields.put(name, field.getType()); + schema.set(schema.indexOf(new SchemaField(name, existing)), new SchemaField(name, field.getType())); + } } } } return schema; } Suggestion importance[1-10]: 4 __ Why: The concern is valid in theory — after `coerceUnionTypes`, all nodes should have the same type for shared fields, so the first-seen type should already be the coerced type. The suggested fix using `indexOf` with a new `SchemaField` object is fragile and may not work correctly if `equals` isn't defined properly. The issue is real but the proposed solution has its own problems.	Low

Previous suggestions

Suggestions up to commit 3782336

Category	Suggestion	Impact
Possible issue	Fix relBuilder stack state after maxout limit After applying `LogicalSystemLimit`, the node is pushed onto the builder stack but `peek()` is called without a prior `build()` on the non-maxout path. When `maxout` is null, `context.relBuilder.peek()` returns the union node correctly, but when `maxout` is not null, the `LogicalSystemLimit` is pushed but never built/returned consistently. The method should always return `context.relBuilder.peek()` after ensuring the final node is on the stack, which it does — but the `push` after `build()` inside the maxout branch leaves the builder in an inconsistent state for subsequent operations since the built node is pushed back. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2486-2499] for (RelNode input : unifiedInputs) { context.relBuilder.push(input); } context.relBuilder.union(true, unifiedInputs.size()); // true = UNION ALL if (node.getMaxout() != null) { + RelNode unionNode = context.relBuilder.build(); context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, - context.relBuilder.build(), + unionNode, context.relBuilder.literal(node.getMaxout()))); } return context.relBuilder.peek(); Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that calling `context.relBuilder.build()` inside `LogicalSystemLimit.create()` consumes the union node from the stack, and then `push` puts the limit node back. The improved code explicitly separates `build()` from the `push`, making the stack management clearer and less error-prone for future maintenance.	Low
Possible issue	Ensure unified schema uses coerced types The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, subsequent nodes may have had their types changed. The schema should reflect the coerced (common) type, not just the first-seen type. This can cause type mismatches when projecting fields from later nodes, leading to incorrect CASTs or runtime errors. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); } + // After coercion, all nodes sharing a field name should have the same type; + // the first-seen type is already the common type post-coercion. } } return schema; } Suggestion importance[1-10]: 3 __ Why: The suggestion claims that `buildUnifiedSchemaForUnion` uses pre-coercion types, but since `coerceUnionTypes` is called first and returns new `RelNode` objects with updated types, the first-seen type in the coerced inputs is already the common type. The improved code is functionally identical to the existing code with only a comment added, making this a low-impact suggestion.	Low
General	Clarify constructor design for immutable fields Using both `@RequiredArgsConstructor` and `@AllArgsConstructor` together with a mix of `final` and non-final fields creates two constructors: one with only `datasets` and one with both `datasets` and `maxout`. However, `@RequiredArgsConstructor` generates a constructor for `final` fields only, so `maxout` (non-final) is excluded. This is intentional but fragile — if `maxout` is ever made `final`, the behavior changes silently. Consider making `maxout` final and using a single `@AllArgsConstructor`, or document the intent clearly. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; + public Union(List<UnresolvedPlan> datasets) { + this(datasets, null); + } Suggestion importance[1-10]: 3 __ Why: The suggestion to make `maxout` final and use `@AllArgsConstructor` with an explicit no-maxout constructor improves immutability and clarity. However, this is a minor design preference and the existing dual-annotation approach with `@RequiredArgsConstructor` and `@AllArgsConstructor` is a common Lombok pattern that works correctly.	Low
General	Handle qualified dataset name parsing correctly When parsing a `qualifiedDatasetName`, `getText()` returns the raw concatenated text including any COLON and DOT separators (e.g., `cluster:index.alias` becomes `cluster:index.alias` as a single string). This is passed directly to `qualifiedName`, which may not correctly handle cross-cluster or dotted index names. The name should be split on the appropriate delimiters to construct a proper qualified name with separate parts. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1355-1362] for (OpenSearchPPLParser.UnionDatasetContext datasetCtx : ctx.unionDataset()) { if (datasetCtx.subSearch() != null) { datasets.add(visitSubSearch(datasetCtx.subSearch())); } else if (datasetCtx.qualifiedDatasetName() != null) { + // Use the full text as a single qualified name part to preserve + // index patterns (e.g., logs-*, cluster:index) correctly String datasetName = datasetCtx.qualifiedDatasetName().getText(); datasets.add(new Relation(Collections.singletonList(AstDSL.qualifiedName(datasetName)))); } } Suggestion importance[1-10]: 1 __ Why: The improved code is functionally identical to the existing code — it keeps the same `getText()` call and `qualifiedName` usage, only adding a comment. The suggestion raises a valid concern about cross-cluster name parsing but doesn't actually fix anything, making it a score of 1.	Low

Suggestions up to commit f05fbdb

Category	Suggestion	Impact
Possible issue	Fix maxout limit node not pushed onto builder stack After calling `context.relBuilder.build()` inside the `if (node.getMaxout() != null)` block, the result is passed to `LogicalSystemLimit.create` but never pushed back onto the builder stack. The subsequent `context.relBuilder.peek()` will then return the union node rather than the limit node, causing the maxout limit to be silently dropped from the plan. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2467-2480] for (RelNode input : unifiedInputs) { context.relBuilder.push(input); } context.relBuilder.union(true, unifiedInputs.size()); // true = UNION ALL if (node.getMaxout() != null) { + RelNode unionNode = context.relBuilder.build(); context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, - context.relBuilder.build(), + unionNode, context.relBuilder.literal(node.getMaxout()))); } return context.relBuilder.peek(); Suggestion importance[1-10]: 8 __ Why: This is a real bug: `context.relBuilder.build()` pops the union node from the stack and passes it to `LogicalSystemLimit.create`, but the resulting limit node is pushed via `context.relBuilder.push(...)` correctly. Looking more carefully at the code, the `push` call does push the limit node, so `peek()` would return it. The suggestion's premise may be incorrect — the existing code does push the limit node. Score lowered accordingly, but the suggestion highlights a subtle flow worth verifying.	Medium
Possible issue	Fix unified schema using post-coercion types The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, subsequent nodes may have the same field with a different (coerced) type. The schema should use the coerced/target type for each field, not just the first-seen type. This can cause type mismatches in the subsequent `buildProjectionForUnion` cast logic. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); + } else { + // Update schema entry if a wider type is found (post-coercion nodes may differ) + RelDataType existing = seenFields.get(name); + if (!existing.equals(field.getType())) { + // Replace with the current type if it differs (coercion should have unified already) + seenFields.put(name, field.getType()); + schema.replaceAll(sf -> sf.getName().equals(name) ? new SchemaField(name, field.getType()) : sf); + } } } } return schema; } Suggestion importance[1-10]: 4 __ Why: The concern is valid in theory — `buildUnifiedSchemaForUnion` uses the first-seen type, which may differ from the coerced type in other nodes. However, since `coerceUnionTypes` is called first and should unify types across all inputs before `unifySchemasForUnion` runs, the first-seen type after coercion should already be the common type. The suggested fix adds complexity and may not address a real bug in practice.	Low
General	Include all command names in anonymizer keyword exclusion list The `keywords` regex negative lookahead is built by concatenating `commandName` directly into the regex pattern. If `commandName` contains regex special characters, this could break the pattern. Additionally, when `visitMultisearch` calls this method, the keyword `multisearch` is included, but when `visitUnion` calls it, only `union` is added — neither includes the other command name, so one command's name could be incorrectly anonymized as `identifier` in the other's output. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [790-794] private String anonymizeSubsearchCommand(String commandName, List<UnresolvedPlan> subsearches) { String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" - + commandName + + "multisearch\|union" + "\|search\|table\|identifier\|\\\\\\*"; Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that when anonymizing a `union` subsearch, the keyword `multisearch` is not in the exclusion list (and vice versa), which could cause command names to be incorrectly replaced with `identifier`. Hardcoding all command names is a reasonable fix, though it requires maintenance as new commands are added.	Low
General	Make maxout field immutable and constructors explicit Using both `@RequiredArgsConstructor` and `@AllArgsConstructor` together with a mix of `final` and non-final fields creates two constructors: one with only `datasets` (from `@RequiredArgsConstructor`) and one with both fields (from `@AllArgsConstructor`). However, `maxout` is non-final and mutable, which can lead to inconsistent state. Consider making `maxout` final and removing `@RequiredArgsConstructor`, or using a single explicit constructor to avoid ambiguity. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; + public Union(List<UnresolvedPlan> datasets) { + this(datasets, null); + } Suggestion importance[1-10]: 4 __ Why: The suggestion to make `maxout` final and use explicit constructors improves immutability and clarity. The dual `@RequiredArgsConstructor`/`@AllArgsConstructor` pattern with mixed final/non-final fields is a valid design concern, though it works correctly in practice for this use case.	Low

Suggestions up to commit 60613ec

Category	Suggestion	Impact
Possible issue	Fix inconsistent RelBuilder state after maxout branch After the `if (node.getMaxout() != null)` block, if `maxout` is set, `LogicalSystemLimit` is pushed directly onto the builder but then `context.relBuilder.peek()` is called without a prior `build()`. If `maxout` is null, `context.relBuilder.build()` was already called inside the limit block, leaving the builder in an inconsistent state. The final `return context.relBuilder.peek()` should be consistent regardless of the maxout branch. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2472-2480] if (node.getMaxout() != null) { context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, context.relBuilder.build(), context.relBuilder.literal(node.getMaxout()))); } -return context.relBuilder.peek(); +return context.relBuilder.build(); Suggestion importance[1-10]: 7 __ Why: This is a valid concern about `RelBuilder` state consistency. When `maxout` is set, `context.relBuilder.build()` is called inside the limit block, and then `LogicalSystemLimit` is pushed back. Calling `peek()` vs `build()` at the end could lead to inconsistent behavior depending on how the caller uses the returned node. Changing to `build()` would be more consistent with the pattern used elsewhere in the codebase.	Medium
Possible issue	Fix unified schema type resolution after coercion The unified schema is built using only the type from the first node that introduces a field, but after `coerceUnionTypes` runs, subsequent nodes may have the same field with a different (coerced) type. The schema should use the coerced target type from `targetTypeMap` rather than the first-seen type, otherwise the projection cast in `buildProjectionForUnion` may cast to the wrong type. Consider passing the target type map or re-deriving the common type here. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + // After coercion, all nodes sharing this field should have the same type; + // use the type from the first occurrence (post-coercion). + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); } } } return schema; } Suggestion importance[1-10]: 3 __ Why: The suggestion claims there's a bug where the schema uses the first-seen type rather than the coerced type. However, `buildUnifiedSchemaForUnion` is called after `coerceUnionTypes`, so the input nodes already have coerced types. The first occurrence of a field in the coerced nodes will already have the target type, making this a non-issue in practice. The `improved_code` is also essentially identical to the `existing_code`.	Low
General	Fix temporal type widening to preserve timezone information The temporal type widening logic returns `TIMESTAMP` when either type is `TIMESTAMP`, even if the other is `TIMESTAMP_WITH_LOCAL_TIME_ZONE`. This means `TIMESTAMP_WITH_LOCAL_TIME_ZONE` would be silently downcast to `TIMESTAMP`, potentially losing timezone information. `TIMESTAMP_WITH_LOCAL_TIME_ZONE` should be considered wider than plain `TIMESTAMP`. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [378-387] private static SqlTypeName getWiderTemporalTypeForUnion(SqlTypeName type1, SqlTypeName type2) { - if (type1 == SqlTypeName.TIMESTAMP \|\| type2 == SqlTypeName.TIMESTAMP) { - return SqlTypeName.TIMESTAMP; - } if (type1 == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE \|\| type2 == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE) { return SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE; } + if (type1 == SqlTypeName.TIMESTAMP \|\| type2 == SqlTypeName.TIMESTAMP) { + return SqlTypeName.TIMESTAMP; + } return SqlTypeName.DATE; } Suggestion importance[1-10]: 6 __ Why: This is a valid correctness issue: the current logic returns `TIMESTAMP` when one type is `TIMESTAMP` and the other is `TIMESTAMP_WITH_LOCAL_TIME_ZONE`, silently losing timezone information. Reordering the checks so `TIMESTAMP_WITH_LOCAL_TIME_ZONE` takes priority is the correct fix for proper type widening semantics.	Low
General	Make maxout field immutable to prevent inconsistent state Having both `@RequiredArgsConstructor` and `@AllArgsConstructor` with a `final` field and a non-final field creates two constructors: one with only `datasets` and one with both `datasets` and `maxout`. However, `maxout` is non-final and mutable, which can lead to inconsistent state. Consider making `maxout` final and using only `@AllArgsConstructor`, or use a builder pattern to avoid accidental mutation. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; - Suggestion importance[1-10]: 4 __ Why: Making `maxout` final is a reasonable immutability improvement for an AST node. However, the current design with `@RequiredArgsConstructor` and `@AllArgsConstructor` is intentional to allow creating a `Union` with or without `maxout`, and the `attach` method already creates a new `Union` instance. The risk of accidental mutation is low in practice.	Low

Suggestions up to commit 0ff52aa

Category	Suggestion	Impact
General	Parse qualified dataset names from individual tokens When a `qualifiedDatasetName` contains a colon (e.g., `cluster:index.alias`), `getText()` returns the raw concatenated text including the colon and dots without any parsing. This raw string is passed directly as a qualified name, which may not be correctly interpreted downstream. The grammar rule `ident (COLON ident (DOT ident)*)?` should be parsed into its components (cluster, index, alias) rather than using the raw text. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1355-1362] for (OpenSearchPPLParser.UnionDatasetContext datasetCtx : ctx.unionDataset()) { if (datasetCtx.subSearch() != null) { datasets.add(visitSubSearch(datasetCtx.subSearch())); } else if (datasetCtx.qualifiedDatasetName() != null) { - String datasetName = datasetCtx.qualifiedDatasetName().getText(); - datasets.add(new Relation(Collections.singletonList(AstDSL.qualifiedName(datasetName)))); + OpenSearchPPLParser.QualifiedDatasetNameContext nameCtx = datasetCtx.qualifiedDatasetName(); + // Build qualified name from individual ident tokens to handle cluster:index.alias correctly + List<String> parts = nameCtx.ident().stream() + .map(RuleContext::getText) + .collect(Collectors.toList()); + datasets.add(new Relation(Collections.singletonList( + AstDSL.qualifiedName(parts.toArray(new String[0]))))); } } Suggestion importance[1-10]: 5 __ Why: The concern about `getText()` returning raw concatenated text for `cluster:index.alias` patterns is valid. Parsing individual `ident` tokens would be more robust for cross-cluster references, though it's unclear if the downstream `qualifiedName` handling already accounts for this format.	Low
	Guard against relBuilder stack corruption per dataset Each dataset's `RelNode` is built by calling `context.relBuilder.build()`, which pops from the shared `relBuilder` stack. If a dataset's sub-plan pushes multiple nodes onto the builder (e.g., due to nested unions or joins), only the top node is captured, potentially leaving stale nodes on the stack and corrupting subsequent operations. Consider saving and restoring the builder state, or verifying the stack depth before and after each dataset visit. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2453-2457] for (UnresolvedPlan dataset : node.getDatasets()) { + int stackSizeBefore = context.relBuilder.size(); UnresolvedPlan prunedDataset = dataset.accept(new EmptySourcePropagateVisitor(), null); prunedDataset.accept(this, context); + if (context.relBuilder.size() != stackSizeBefore + 1) { + throw new IllegalStateException( + "Expected exactly one RelNode pushed for union dataset, but stack changed by " + + (context.relBuilder.size() - stackSizeBefore)); + } inputNodes.add(context.relBuilder.build()); } Suggestion importance[1-10]: 4 __ Why: The concern about `relBuilder` stack corruption is valid in theory, but the `improved_code` adds a runtime assertion rather than actually fixing the issue. This is a defensive check that could help catch bugs but doesn't address the root cause, and `RelBuilder.size()` may not be a public API.	Low
	Remove redundant constructor annotation on Union class Both `@RequiredArgsConstructor` and `@AllArgsConstructor` are present. `@RequiredArgsConstructor` generates a constructor with only `datasets` (the `final` field), while `@AllArgsConstructor` generates one with both `datasets` and `maxout`. However, `maxout` is not `final`, so `@RequiredArgsConstructor` will not include it. This means the single-argument constructor sets `maxout` to `null` implicitly, which is fine, but having both annotations is redundant and confusing. Remove `@RequiredArgsConstructor` since `@AllArgsConstructor` covers all cases, and add a dedicated constructor for the `datasets`-only case if needed. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; private Integer maxout; + public Union(List<UnresolvedPlan> datasets) { + this.datasets = datasets; + this.maxout = null; + } + Suggestion importance[1-10]: 4 __ Why: Having both `@RequiredArgsConstructor` and `@AllArgsConstructor` is indeed redundant and potentially confusing. The suggestion is valid and improves code clarity, though the impact is minor since both annotations work correctly together in Lombok.	Low
Possible issue	Fix unified schema type resolution after coercion The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, subsequent nodes may have the same field with a different (coerced) type. The schema should use the coerced target type from `targetTypeMap` rather than the first-seen type, otherwise the projection in `buildProjectionForUnion` will always detect a mismatch and add unnecessary CASTs. Consider passing the target type map or re-deriving the common type here. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); + } else if (!seenFields.get(name).equals(field.getType())) { + // After coercion, all inputs sharing a field name should have the same type. + // If they still differ, prefer the wider type already recorded. } } } return schema; } Suggestion importance[1-10]: 3 __ Why: The suggestion points out that `buildUnifiedSchemaForUnion` uses the first-seen type, but after `coerceUnionTypes` all inputs sharing a field should already have the same coerced type. The `improved_code` doesn't actually fix the described problem (it adds a comment but no real logic change), making the suggestion's impact minimal and the fix incomplete.	Low

Suggestions up to commit 12b2eb5

Category	Suggestion	Impact
General	Fix mutable field inconsistency in AST node The class has both `@RequiredArgsConstructor` and `@AllArgsConstructor`. `@RequiredArgsConstructor` generates a constructor for `final` fields only (i.e., `datasets`), while `@AllArgsConstructor` generates one for all fields (`datasets` and `maxout`). This means there are two constructors: `Union(List)` and `Union(List,` `Integer)`. However, `maxout` is not `final`, so it can be mutated after construction, which is inconsistent with the immutable design of other AST nodes. Consider making `maxout` `final` and removing `@RequiredArgsConstructor` to avoid ambiguity, or use a single constructor pattern. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; + public Union(List<UnresolvedPlan> datasets) { + this(datasets, null); + } Suggestion importance[1-10]: 5 __ Why: Making `maxout` `final` and consolidating constructors improves immutability consistency with other AST nodes. The suggestion is valid and the `improved_code` correctly reflects the change, though the impact is moderate since the field is only set at construction time in practice.	Low
	Use complete keyword list in anonymizer regex The `keywords` regex negative lookahead is built by concatenating `commandName` directly into the regex pattern. If `commandName` contains regex special characters, this could break the pattern or cause incorrect anonymization. While current command names (`multisearch`, `union`) are safe, this is fragile. Additionally, the `keywords` string does not include `union` when called for `multisearch` and vice versa — meaning field names that match the other command name could be incorrectly anonymized. Consider using a fixed comprehensive keyword list instead of dynamically inserting the command name. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [790-794] private String anonymizeSubsearchCommand(String commandName, List<UnresolvedPlan> subsearches) { String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" - + commandName - + "\|search\|table\|identifier\|\\\\\\"; + + "multisearch\|union\|search\|table\|identifier\|\\\\\\"; Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that the dynamic keyword list excludes the other command name (e.g., `union` is missing when anonymizing `multisearch` and vice versa), which could cause incorrect anonymization of field names matching those keywords. The fix is straightforward and the `improved_code` accurately reflects the change.	Low
	Validate builder stack consistency per dataset Each dataset is processed by calling `prunedDataset.accept(this, context)` and then `context.relBuilder.build()`. However, if the visitor for a dataset internally pushes multiple nodes onto the builder stack (e.g., for complex subsearches), `build()` only pops one node. Additionally, if an exception occurs mid-iteration, the builder stack may be left in an inconsistent state for subsequent datasets. Consider wrapping each dataset processing in a try-catch or verifying the stack depth before and after each dataset to ensure correctness. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2453-2457] for (UnresolvedPlan dataset : node.getDatasets()) { UnresolvedPlan prunedDataset = dataset.accept(new EmptySourcePropagateVisitor(), null); + int stackSizeBefore = context.relBuilder.size(); prunedDataset.accept(this, context); + if (context.relBuilder.size() != stackSizeBefore + 1) { + throw new IllegalStateException( + "Expected exactly one RelNode pushed for union dataset, but stack grew by " + + (context.relBuilder.size() - stackSizeBefore)); + } inputNodes.add(context.relBuilder.build()); } Suggestion importance[1-10]: 4 __ Why: The suggestion to validate the `relBuilder` stack depth is a reasonable defensive check, but `context.relBuilder.size()` may not be a public API in all Calcite versions. The concern about stack inconsistency is valid but the improved code may not compile depending on the Calcite API available.	Low
	Ensure unified schema uses consistent coerced types The unified schema is built using only the first occurrence of each field's type, but `coerceUnionTypes` already computed common supertypes. If `buildUnifiedSchemaForUnion` is called after `coerceUnionTypes`, the coerced types should be reflected in the schema. However, since the schema is built from the first node that contains a field, a field that only appears in a later node will use that node's (possibly non-coerced) type. More critically, if two nodes have the same field name but different types after coercion (which shouldn't happen after coercion), the schema would be inconsistent. The real bug is that the schema type for a field should be taken from the coerced inputs, but the first-seen type may not match the coerced target type if the field only exists in one input. This is correct for missing fields, but for shared fields the coerced type should be used consistently — verify that the coerced inputs always agree on type before building the schema. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); - Map<String, RelDataType> seenFields = new HashMap<>(); + Map<String, RelDataType> seenFields = new LinkedHashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + seenFields.put(name, field.getType()); + } else { + // After coercion, shared fields should have the same type; assert consistency + RelDataType existing = seenFields.get(name); + if (!existing.getSqlTypeName().equals(field.getType().getSqlTypeName())) { + // Prefer the wider/coerced type (should not happen after coerceUnionTypes) + seenFields.put(name, field.getType()); + } } } } + seenFields.forEach((name, type) -> schema.add(new SchemaField(name, type))); return schema; } Suggestion importance[1-10]: 3 __ Why: After `coerceUnionTypes`, all shared fields should already have the same type, so the scenario described is unlikely in practice. The suggestion adds complexity without addressing a clear bug, and the `improved_code` changes the behavior in a way that may not be necessary given the existing coercion pipeline.	Low

github-actions · 2026-03-17T16:50:06Z

Persistent review updated to latest commit 5fd5e1c

github-actions · 2026-03-17T16:59:31Z

Persistent review updated to latest commit dcbc60f

github-actions · 2026-03-17T17:01:15Z

Persistent review updated to latest commit 12b2eb5

github-actions · 2026-03-17T17:10:46Z

Persistent review updated to latest commit 0ff52aa

srikanthpadakanti · 2026-03-17T17:44:42Z

Hello @anasalkouz @mengweieric Please enforce the PR label and review this. Thanks

Also, the failing check sql-cli integration test uses a different Gradle version (8.14.2 vs 9.2.0). I believe this has nothing to do with my changes.

github-actions · 2026-03-17T19:23:57Z

Persistent review updated to latest commit 60613ec

mengweieric · 2026-03-18T17:33:17Z

Please take a look at the CI failure.

github-actions · 2026-03-18T17:48:25Z

Persistent review updated to latest commit f05fbdb

srikanthpadakanti · 2026-03-18T18:20:42Z

Please take a look at the CI failure.

Took care of it. Please review. Thanks.

github-actions · 2026-03-20T18:56:53Z

Persistent review updated to latest commit 3782336

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java # integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java

github-actions · 2026-03-20T19:02:20Z

Persistent review updated to latest commit 9ec4314

srikanthpadakanti · 2026-03-20T21:19:05Z

Hello @mengweieric Can you please review this.

srikanthpadakanti force-pushed the feature/union-public branch from 7ab57cb to 5fd5e1c Compare March 17, 2026 16:47

srikanthpadakanti force-pushed the feature/union-public branch 2 times, most recently from dcbc60f to 12b2eb5 Compare March 17, 2026 16:59

srikanthpadakanti force-pushed the feature/union-public branch from 12b2eb5 to 0ff52aa Compare March 17, 2026 17:08

anasalkouz added PPL Piped processing language feature labels Mar 17, 2026

srikanthpadakanti force-pushed the feature/union-public branch from 60613ec to f05fbdb Compare March 18, 2026 17:46

srikanthpadakanti force-pushed the feature/union-public branch from f05fbdb to 3782336 Compare March 20, 2026 18:54

srikanthpadakanti force-pushed the feature/union-public branch from 3782336 to 9ec4314 Compare March 20, 2026 19:00

Conversation

srikanthpadakanti commented Mar 17, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 9ec4314)

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

srikanthpadakanti commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

mengweieric commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

srikanthpadakanti commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

srikanthpadakanti commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 17, 2026 •

edited

Loading

(Review updated until commit `9ec4314`)

github-actions bot commented Mar 17, 2026 •

edited

Loading