Skip to content

added cloudwatch style contains operator#5219

Open
ThyTran1402 wants to merge 5 commits intoopensearch-project:mainfrom
ThyTran1402:feat/cloudwatch_contains_operator
Open

added cloudwatch style contains operator#5219
ThyTran1402 wants to merge 5 commits intoopensearch-project:mainfrom
ThyTran1402:feat/cloudwatch_contains_operator

Conversation

@ThyTran1402
Copy link

@ThyTran1402 ThyTran1402 commented Mar 9, 2026

Description

  • Adds a contains operator to PPL where clauses as a more readable compare to LIKE(field, '%keyword%') or field like '%keyword%'.
  • | where message contains 'error is equivalent to | where message like '%error%' .

Behavior:

  • Case-insensitive substring match (backed by ilike)
  • RHS must be a string literal; a field reference on the RHS raises a SemanticCheckException - % and _ in the keyword are treated as LIKE wildcards (consistent with the underlying operator)
  • contains can still be used as a field/identifier name in other positions

Related Issues

Resolves #5181

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Reviewer Guide 🔍

(Review updated until commit 0dc8e95)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add ILIKE and CONTAINS grammar/operator infrastructure

Relevant files:

  • language-grammar/src/main/antlr4/OpenSearchPPLLexer.g4
  • language-grammar/src/main/antlr4/OpenSearchPPLParser.g4
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • core/src/main/java/org/opensearch/sql/expression/operator/predicate/BinaryPredicateOperators.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/FilterQueryBuilder.java

Sub-PR theme: Implement and document CONTAINS operator in PPL parser

Relevant files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstExpressionBuilderTest.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/WhereCommandIT.java
  • docs/user/ppl/functions/condition.md
  • docs/user/ppl/functions/index.md

⚡ Recommended focus areas for review

Wildcard Escaping

The escaping logic in the contains operator first escapes backslashes, then % and _. However, the raw string value from the Literal may already have been processed by the parser (e.g., escape sequences resolved). It's worth verifying that the escaping order and the actual string value at this point are correct, especially for inputs like '\\' in PPL which may already be unescaped to a single backslash before this code runs.

String raw = ((Literal) right).getValue().toString();
String escaped = raw.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_");
String wrapped = "%" + escaped + "%";
return new Compare(
    ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));
Case-Insensitive Mismatch

ILIKE is mapped to the same LikeQuery as LIKE. The LikeQuery implementation should be verified to perform case-insensitive matching when used with ILIKE, otherwise the case-insensitivity guarantee of contains will not be honored at the OpenSearch query level.

.put(BuiltinFunctionName.ILIKE.getName(), new LikeQuery())
Incorrect Doc Output

The example output for the basic substring filter shows fetched rows / total rows = 1/1 and only one row (Amber), but the integration test testContainsOperator expects both Amber and Chambers to match 'mbe'. The documentation example appears to be incorrect or incomplete.

fetched rows / total rows = 1/1
+-----------+-----+
| firstname | age |
|-----------+-----|
| Amber     | 32  |
+-----------+-----+

</details>

<details><summary><a href='https://github.com/opensearch-project/sql/pull/5219/files#diff-8ab72b32488426e6604c2c1d4664f9b9cb4464427f5ded7f2c34da532f2b149dR216-R227'><strong>LHS Validation Missing</strong></a>

The `contains` operator only validates that the RHS is a string literal, but does not validate that the LHS is a field reference. According to the documentation, the left-hand side must be a field reference. A non-field LHS (e.g., a literal or function call) would silently be accepted.
</summary>

```java
} else if ("contains".equalsIgnoreCase(operator)) {
  UnresolvedExpression left = visit(ctx.left);
  UnresolvedExpression right = visit(ctx.right);
  if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
    throw new SemanticCheckException(
        "The right-hand side of 'contains' must be a string literal");
  }
  String raw = ((Literal) right).getValue().toString();
  String escaped = raw.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_");
  String wrapped = "%" + escaped + "%";
  return new Compare(
      ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Code Suggestions ✨

Latest suggestions up to 0dc8e95

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Case-insensitive query requires dedicated implementation

The LikeQuery implementation is reused for ILIKE, but ILIKE requires
case-insensitive matching while LikeQuery may perform case-sensitive matching
depending on the field mapping. Using the same LikeQuery for ILIKE without any
case-insensitive handling could produce incorrect results for fields that are not
already lowercased. A dedicated ILikeQuery or a parameterized LikeQuery that forces
case-insensitive behavior should be used instead.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/FilterQueryBuilder.java [71]

-.put(BuiltinFunctionName.ILIKE.getName(), new LikeQuery())
+.put(BuiltinFunctionName.ILIKE.getName(), new ILikeQuery())
Suggestion importance[1-10]: 6

__

Why: This is a valid concern — reusing LikeQuery for ILIKE without case-insensitive handling could produce incorrect results. However, the improved_code references a non-existent ILikeQuery class, making it not directly applicable without additional implementation work.

Low
Verify escape character consistency with ilike implementation

The escape order is incorrect. When replacing </code> with \, the subsequent replacements
for % and will not be affected, but if the raw string contains % or _, the
backslash will be doubled first and then the % or
will also be escaped, resulting
in \\% instead of \%. The replacements for % and _ should happen before the
backslash replacement, or the logic should be reconsidered. Actually the current
order (backslash first) is correct for preventing double-escaping, but the escape
character used in the ILIKE pattern needs to be consistent with what the underlying
ilike implementation expects. Verify that the escape character </code> is the correct
escape character for the ilike implementation being used.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [223-224]

 String raw = ((Literal) right).getValue().toString();
+// Escape backslash first, then special LIKE wildcard characters
 String escaped = raw.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_");
Suggestion importance[1-10]: 2

__

Why: The suggestion's improved_code is essentially identical to the existing_code (just adds a comment), and the analysis itself acknowledges the current order is correct. The suggestion to "verify" consistency is vague and doesn't represent a concrete code fix.

Low
General
Test result ordering is non-deterministic

The integration test assumes a specific ordering of results (Amber before Chambers),
but without an explicit sort command the order of returned rows is not guaranteed.
This could cause the test to be flaky. Add a sort command to the query or use an
order-insensitive assertion method.

integ-test/src/test/java/org/opensearch/sql/ppl/WhereCommandIT.java [154]

-verifyDataRows(result, rows("Amber"), rows("Chambers"));
+verifyDataRows(result, rows("Chambers"), rows("Amber"));
+// or better: use an order-insensitive assertion, e.g.:
+// verifyDataRowsInAnyOrder(result, rows("Amber"), rows("Chambers"));
Suggestion importance[1-10]: 5

__

Why: This is a valid concern about test flakiness due to non-deterministic ordering. However, the improved_code just swaps the order rather than using an order-insensitive assertion, which doesn't actually fix the underlying issue.

Low
Validate left-hand side is a field reference

The validation only checks the right-hand side, but the contains operator
documentation states the left-hand side must be a field reference. If a user writes
'literal' contains 'sub', it would silently proceed without error. Consider adding a
validation that the left-hand side is a Field instance to match the documented
constraint and prevent unexpected behavior.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [216-222]

 } else if ("contains".equalsIgnoreCase(operator)) {
   UnresolvedExpression left = visit(ctx.left);
   UnresolvedExpression right = visit(ctx.right);
   if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
     throw new SemanticCheckException(
         "The right-hand side of 'contains' must be a string literal");
   }
+  if (!(left instanceof Field)) {
+    throw new SemanticCheckException(
+        "The left-hand side of 'contains' must be a field reference");
+  }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies a gap between the documented constraint (LHS must be a field reference) and the implementation. Adding this validation would improve correctness and user-facing error messages, though it's a minor enhancement.

Low

Previous suggestions

Suggestions up to commit ac8c74f
CategorySuggestion                                                                                                                                    Impact
General
Ensure deterministic row ordering in tests

The verifyDataRows assertion checks for exactly these two rows, but the order of
rows returned from OpenSearch is not guaranteed. If the result set returns
"Chambers" before "Amber", the test will fail intermittently. Consider using an
order-insensitive verification or add an explicit | sort firstname to the query to
ensure deterministic ordering.

integ-test/src/test/java/org/opensearch/sql/ppl/WhereCommandIT.java [154]

+JSONObject result =
+    executeQuery(
+        String.format(
+            "source=%s | where firstname contains 'mbe' | fields firstname | sort firstname",
+            TEST_INDEX_ACCOUNT));
 verifyDataRows(result, rows("Amber"), rows("Chambers"));
Suggestion importance[1-10]: 6

__

Why: This is a valid concern — OpenSearch does not guarantee row ordering, so verifyDataRows with positional row matching could fail intermittently. Adding | sort firstname to the query would make the test deterministic and reliable.

Low
Add null guards before visiting operands

The validation only checks the right-hand side, but the left-hand side (left) could
also be a non-field expression (e.g., a literal or function call). Additionally,
there is no null-check on ctx.left or ctx.right before visiting them, which could
cause a NullPointerException if the grammar allows missing operands. Consider adding
a null guard or validating that ctx.left and ctx.right are non-null before visiting.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [216-222]

 } else if ("contains".equalsIgnoreCase(operator)) {
+  if (ctx.left == null || ctx.right == null) {
+    throw new SemanticCheckException(
+        "The 'contains' operator requires both a left-hand field and a right-hand string literal");
+  }
   UnresolvedExpression left = visit(ctx.left);
   UnresolvedExpression right = visit(ctx.right);
   if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
     throw new SemanticCheckException(
         "The right-hand side of 'contains' must be a string literal");
   }
Suggestion importance[1-10]: 2

__

Why: In ANTLR-generated parsers, ctx.left and ctx.right are guaranteed to be non-null when the grammar rule matches (the grammar requires both operands for comparisonOperator), so null checks are unnecessary. This is a low-impact defensive suggestion that adds boilerplate without meaningful benefit.

Low
Possible issue
Verify escape character consistency with ilike

The escape order is incorrect. When the raw string contains a backslash followed by
% or , the current code first doubles the backslash and then escapes % and , which
could produce incorrect patterns. The backslash replacement must happen first
(before escaping % and _), which it does here — but the issue is that replace("\",
"\\") in Java replaces the literal single backslash with two backslashes, which is
correct. However, the escape character used by ilike may differ from </code>. Verify that
the ilike implementation uses </code> as its escape character; if it uses a different
escape character, the escaping logic here will be wrong and produce incorrect
results.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [223-224]

 String raw = ((Literal) right).getValue().toString();
+// Ensure escape character '\' is handled first, then escape ilike wildcards
 String escaped = raw.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_");
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify that ilike uses \\ as its escape character, but doesn't provide a concrete fix — the improved_code is essentially identical to the existing_code with only a comment added. This is a verification request rather than a concrete improvement, and the escape order (backslash first, then % and _) is already correct in the PR.

Low
Suggestions up to commit f5b3bbf
CategorySuggestion                                                                                                                                    Impact
General
Guard contains operator behind Calcite feature flag

The contains operator handling does not check whether Calcite is enabled, unlike the
LIKE operator which has a similar guard (UnresolvedPlanHelper.isCalciteEnabled). If
contains is only supported in the Calcite execution path, executing it without
Calcite enabled could lead to runtime errors or unexpected behavior. Consider adding
a similar guard or throwing a meaningful error when Calcite is not enabled.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [216-228]

 } else if ("contains".equalsIgnoreCase(operator)) {
+  if (!UnresolvedPlanHelper.isCalciteEnabled(astBuilder.getSettings())) {
+    throw new SemanticCheckException(
+        "The 'contains' operator is only supported when Calcite is enabled");
+  }
   UnresolvedExpression left = visit(ctx.left);
   UnresolvedExpression right = visit(ctx.right);
   if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
     throw new SemanticCheckException(
         "The right-hand side of 'contains' must be a string literal");
   }
   String raw = ((Literal) right).getValue().toString();
   String escaped = raw.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_");
   String wrapped = "%" + escaped + "%";
   return new Compare(
       ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));
 }
Suggestion importance[1-10]: 6

__

Why: This is a valid concern - the contains operator translates to ilike which may only be supported in the Calcite execution path, similar to how LIKE has a isCalciteEnabled guard. Without this check, non-Calcite users could encounter runtime errors when using contains.

Low
Possible issue
Fix backslash escaping in ILIKE pattern

The escape sequence for backslash is incorrect. In Java, "\\" as a replacement
string in String.replace() produces a single backslash </code>, not two. To produce \ in
the output, the replacement string should be "\\\\" (four backslashes). This
means a literal </code> in the input will not be correctly escaped to \ in the ILIKE
pattern, potentially causing incorrect matching behavior.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [223-224]

 String raw = ((Literal) right).getValue().toString();
-String escaped = raw.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_");
+String escaped = raw.replace("\\", "\\\\\\\\").replace("%", "\\%").replace("_", "\\_");
Suggestion importance[1-10]: 2

__

Why: The suggestion claims the backslash replacement is wrong, but the existing code raw.replace("\\\\", "\\\\\\\\") in Java source is actually correct: it replaces a single backslash \ with two backslashes \\ in the output string. The 'improved_code' with 8 backslashes would be incorrect. The test case in the PR also validates this behavior correctly.

Low
Suggestions up to commit ad44495
CategorySuggestion                                                                                                                                    Impact
Possible issue
Escape wildcard characters in contains operand

The contains operator wraps the user-provided string with % wildcards, but if the
input string itself contains % or _ wildcard characters, they will be interpreted as
ILIKE pattern metacharacters, leading to unintended matches. These special
characters should be escaped before wrapping with %.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [223]

-String wrapped = "%" + ((Literal) right).getValue() + "%";
+String rawValue = ((Literal) right).getValue().toString()
+    .replace("\\", "\\\\")
+    .replace("%", "\\%")
+    .replace("_", "\\_");
+String wrapped = "%" + rawValue + "%";
Suggestion importance[1-10]: 7

__

Why: This is a valid concern - if the user's input contains % or _ characters, they would be interpreted as ILIKE wildcards, causing unintended matches. The fix correctly escapes these metacharacters before wrapping with %.

Medium
General
Respect Calcite settings for contains operator

The contains operator only validates that the right-hand side is a string literal,
but does not validate that the left-hand side is a field or string expression.
Adding a check or at least a comment clarifying this assumption would improve
robustness. More critically, the early return here bypasses the rest of the method,
which may handle additional settings-based logic (e.g., Calcite enablement checks)
that should also apply to contains.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [216-225]

 } else if ("contains".equalsIgnoreCase(operator)) {
   UnresolvedExpression left = visit(ctx.left);
   UnresolvedExpression right = visit(ctx.right);
   if (!(right instanceof Literal) || ((Literal) right).getType() != DataType.STRING) {
     throw new SemanticCheckException(
         "The right-hand side of 'contains' must be a string literal");
   }
   String wrapped = "%" + ((Literal) right).getValue() + "%";
-  return new Compare(ILIKE.getName().getFunctionName(), left, new Literal(wrapped, DataType.STRING));
+  String ilikeFn = UnresolvedPlanHelper.isCalciteEnabled(astBuilder.getSettings())
+      ? (UnresolvedPlanHelper.legacyPreferred(astBuilder.getSettings()) ? ILIKE.getName().getFunctionName() : "ilike")
+      : ILIKE.getName().getFunctionName();
+  return new Compare(ilikeFn, left, new Literal(wrapped, DataType.STRING));
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion about bypassing Calcite settings is questionable - the contains operator always maps to ilike semantics by design, and the improved code's logic for determining the function name is unclear and potentially incorrect. The ILIKE function name is used in both branches of the ternary, making the suggestion's value minimal.

Low

JSONObject result =
executeQuery(
String.format(
"source=%s | where firstname contains 'mbe' | fields firstname",
Copy link
Member

@LantaoJin LantaoJin Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the contains case-sensitive or insensitive? Need more tests for this.
Can you link the cloudwatch doc link in description?
And this PR should include user document updates. maybe in condition.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think contains is case-insensitive since it's implemented with ILIKE with %value% wrapping.
If a user writes firstname contains '%', the wrapped pattern becomes %%% which would match everything, which is likely unintended. I'll fix this and add more test cases.
Here is the doc provided: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax-Filter.html

@LantaoJin LantaoJin added the enhancement New feature or request label Mar 10, 2026
Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@ThyTran1402
Copy link
Author

Hi @LantaoJin

I think the PR is ready for review.

Thank you!

Copy link
Collaborator

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

String operator = ctx.comparisonOperator().getText();
if ("==".equals(operator)) {
operator = EQUAL.getName().getFunctionName();
} else if ("contains".equalsIgnoreCase(operator)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In principle this should be a const like LIKE but it's probably fine

We already broke the const in the previous if with "==", if someone doesn't like it then they can review this file for consts in another PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. Thank you for reviewing it!

@Swiddis
Copy link
Collaborator

Swiddis commented Mar 16, 2026

Failing unit test CI due to spotless violations, can fix by running ./gradlew :spotlessApply.

Doctests (./gradlew doctest) and integ tests (./gradlew integTest --tests WhereCommandIT) are also failing with assertion errors.

Copy link
Collaborator

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upd: ^

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@ThyTran1402 ThyTran1402 requested a review from ahkcs as a code owner March 16, 2026 19:05
@github-actions
Copy link
Contributor

Persistent review updated to latest commit f5b3bbf

@ThyTran1402
Copy link
Author

ThyTran1402 commented Mar 16, 2026

Failing unit test CI due to spotless violations, can fix by running ./gradlew :spotlessApply.

Doctests (./gradlew doctest) and integ tests (./gradlew integTest --tests WhereCommandIT) are also failing with assertion errors.

Hi @Swiddis

I just clean it up. Can you please review it again?

Thank you!

@ThyTran1402
Copy link
Author

upd: ^

Hi @Swiddis

Are there any changes that you recommend? since I think the PR is ready for review.

@Swiddis
Copy link
Collaborator

Swiddis commented Mar 19, 2026

Doctests and integ tests are still failing.

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit ac8c74f

@ThyTran1402
Copy link
Author

ThyTran1402 commented Mar 20, 2026

Doctests and integ tests are still failing.

hmm that's right 😢 . I fixed doctests and inges tests already. Can you run the testcase again?

Thank you!

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 0dc8e95

@Swiddis Swiddis enabled auto-merge (squash) March 20, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Cloudwatch-style like operator

3 participants