Skip to content

Support multi-field syntax for unified SQL relevance functions#5427

Draft
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:feature/array-multi-field-bracket-syntax
Draft

Support multi-field syntax for unified SQL relevance functions#5427
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:feature/array-multi-field-bracket-syntax

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented May 8, 2026

Description

This PR adds support for standard SQL ARRAY[] syntax in multi-field relevance functions (multi_match, simple_query_string, query_string) for the unified SQL engine. The NamedArgRewriter is extended to detect the ARRAY[] argument and expands it into a MAP constructor with VARCHAR-typed field names, producing plans consistent with the PPL path and compatible with the Analytics Engine's pushdown pattern matching in opensearch-project/OpenSearch#21562.

Example:

Syntax:
  SELECT * FROM t WHERE multi_match(ARRAY['name', 'department'], 'John')

Plan output (matches PPL):
  multi_match(MAP('fields', MAP('name':VARCHAR, 1, 'department':VARCHAR, 1)), MAP('query', 'John'))

Note: This uses ARRAY['field1', 'field2'] rather than V2's bracket syntax ['field1', 'field2'], as Calcite's SQL parser does not support it and requires complex changes in the .ftl parser template. A follow-up PR will document all such syntax differences between V2 and unified SQL engine.

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Add support for standard SQL ARRAY syntax in multi-field relevance
functions (multi_match, simple_query_string, query_string). The
NamedArgRewriter expands ARRAY['f1','f2'] into a MAP with VARCHAR-typed
field names and default weight 1.0, producing plans compatible with
the Analytics engine pushdown rules.

Syntax: multi_match(ARRAY['name', 'department'], 'John')

The key technique is wrapping each field literal in CAST(... AS VARCHAR)
at the SqlNode level so Calcite's validator produces bare RexLiterals
without type-widening CASTs in the final plan.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen self-assigned this May 8, 2026
@dai-chen dai-chen added enhancement New feature or request SQL labels May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The expandFieldsArray method does not validate that array elements are string literals before casting them to VARCHAR. If a non-string element (e.g., a numeric literal or identifier) is passed in ARRAY[123, name], the code will attempt to cast it, potentially producing unexpected results or runtime errors depending on how the downstream code handles the cast. The method should verify that each element is a string literal or handle non-string cases explicitly.

private static SqlNode expandFieldsArray(SqlCall arrayCall) {
  List<SqlNode> mapArgs = new ArrayList<>();
  for (SqlNode element : arrayCall.getOperandList()) {
    mapArgs.add(castToVarchar(element));
    mapArgs.add(SqlLiteral.createExactNumeric(BigDecimal.ONE.toPlainString(), SqlParserPos.ZERO));
  }
  return SqlStdOperatorTable.MAP_VALUE_CONSTRUCTOR.createCall(
      SqlParserPos.ZERO, mapArgs.toArray(SqlNode[]::new));
}
Missing Validation

The condition "fields".equals(paramName) at line 76 only checks the parameter name but does not verify that the function is one of the expected multi-field relevance functions (multi_match, simple_query_string, query_string). If another function happens to have a parameter named 'fields' and a user passes an ARRAY[] for it, this code will incorrectly expand it into the MAP structure intended for relevance functions, potentially breaking that function's semantics.

if ("fields".equals(paramName)
    && op instanceof SqlCall arrayCall
    && arrayCall.getOperator() == SqlStdOperatorTable.ARRAY_VALUE_CONSTRUCTOR) {
  maps[i] = toMap(paramName, expandFieldsArray(arrayCall));

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate array element types

The method does not validate that the array elements are valid field identifiers or
string literals. Consider adding validation to ensure array elements are of expected
types (SqlIdentifier or SqlLiteral) before processing to prevent runtime errors with
malformed input.

api/src/main/java/org/opensearch/sql/api/spec/search/NamedArgRewriter.java [89-97]

 private static SqlNode expandFieldsArray(SqlCall arrayCall) {
   List<SqlNode> mapArgs = new ArrayList<>();
   for (SqlNode element : arrayCall.getOperandList()) {
+    if (!(element instanceof SqlIdentifier || element instanceof SqlLiteral)) {
+      throw new IllegalArgumentException(
+          "Array elements must be field identifiers or string literals");
+    }
     mapArgs.add(castToVarchar(element));
     mapArgs.add(SqlLiteral.createExactNumeric(BigDecimal.ONE.toPlainString(), SqlParserPos.ZERO));
   }
   return SqlStdOperatorTable.MAP_VALUE_CONSTRUCTOR.createCall(
       SqlParserPos.ZERO, mapArgs.toArray(SqlNode[]::new));
 }
Suggestion importance[1-10]: 7

__

Why: Adding validation for array element types (SqlIdentifier or SqlLiteral) would prevent runtime errors from malformed input. This is a reasonable defensive programming practice that improves robustness.

Medium

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant