Skip to content

Conversation

@aalva500-prog
Copy link
Contributor

@aalva500-prog aalva500-prog commented Dec 18, 2025

Description

This PR continues the work done in PR #4816 to add frequent used queries to the big5 workload based on gap analysis between existing benchmarks and frequent used query patterns.

dedup query is added here: #4991

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

noCharger and others added 3 commits November 17, 2025 09:35
Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>
Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

📝 Walkthrough

Summary by CodeRabbit

  • Tests
    • Added integration tests covering regex extraction, script-based pattern matching with aggregation, and pattern matching with sorting.
    • Added test data and expected-output resources to validate regex extraction, script-driven queries, aggregations, and result ordering.
    • Updated expected results for span-based aggregations and struct-field null-checks to reflect expanded result sets.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds three Big5 integration tests and corresponding PPL query resources, updates Big5 test data and expected Calcite YAML for one test, and adjusts expected rows in two existing aggregation/condition integration tests. One new test asserts an explained YAML plan.

Changes

Cohort / File(s) Summary
New integration tests
integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
Adds three test methods: rex_regex_transformation() (loads PPL, times execution, asserts explained YAML via assertYamlEqualsIgnoreId), script_engine_like_pattern_with_aggregation() (loads and times PPL), and script_engine_like_pattern_with_sort() (loads and times PPL).
PPL query resources
integ-test/src/test/resources/big5/queries/rex_regex_transformation.ppl, integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl, integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl
Adds three PPL files: a rex extraction + eval + sort pipeline, a script-based like query with composite aggregation, and a script-based like query with sort + head.
Test data
integ-test/src/test/resources/big5/data/big5.json
Appends two additional log events (sshd and kernel), increasing dataset size for big5 tests.
Expected plan YAML
integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml
Adds expected logical and physical Calcite plan for the rex regex transformation (includes REX_EXTRACT, projection, sort, pushdown details, and limit).
Updated test expectations
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java, integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
Adjusts asserted result rows: span-by-implicit-timestamp test now expects three rows; isNotNull-with-struct test now expects three rows (was one).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • Swiddis
  • joshuali925
  • yuancu
  • kavithacm
  • ps48
  • anirudha
  • dai-chen
  • GumpacG
  • penghuo

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and specifically describes the main change: adding frequently used Big5 PPL queries to the workload.
Description check ✅ Passed The pull request description is directly related to the changeset, explaining the work to add frequently used PPL queries to the big5 workload based on gap analysis.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@aalva500-prog aalva500-prog mentioned this pull request Dec 18, 2025
8 tasks
@aalva500-prog aalva500-prog changed the title Add Frequently Use Big5 PPL Queries Add Frequently Used Big5 PPL Queries Dec 18, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl (1)

5-5: Index name inconsistency across test resources.

This query uses custom-big5 as the default index, while other queries in this PR use big5 (e.g., script_engine_like_pattern_with_sort.ppl, rex_regex_transformation.ppl). Consider standardizing the index name across all test resources unless there's a specific reason for the difference.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7dfabce and 1f006db.

📒 Files selected for processing (8)
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (2 hunks)
  • integ-test/src/test/resources/big5/data/big5.json (1 hunks)
  • integ-test/src/test/resources/big5/queries/dedup_metrics_size_field.ppl (1 hunks)
  • integ-test/src/test/resources/big5/queries/rex_regex_transformation.ppl (1 hunks)
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl (1 hunks)
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/dedup_metrics_size_field.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/parse_regex_with_cast_transformation.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test new grammar rules with positive and negative cases for PPL parser changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Include edge cases and boundary conditions in PPL parser tests
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/parse_regex_with_cast_transformation.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
  • integ-test/src/test/resources/expectedOutput/calcite/dedup_metrics_size_field.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/parse_regex_with_cast_transformation.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
🪛 Biome (2.1.2)
integ-test/src/test/resources/big5/data/big5.json

[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 4-5: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 5-6: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (6)
integ-test/src/test/resources/big5/data/big5.json (1)

1-6: LGTM! Static analysis warnings are false positives.

The file uses NDJSON (newline-delimited JSON) format, which is the standard format for OpenSearch bulk indexing operations. The Biome linter doesn't recognize this format and reports false positives. The new log entries (sshd and kernel) provide appropriate test data for the like pattern queries.

integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl (1)

35-38: LGTM! Query correctly demonstrates LIKE pattern with sorting.

The PPL query properly filters messages containing "sshd" and sorts results by timestamp, which aligns with the new test data added to big5.json.

integ-test/src/test/resources/big5/queries/rex_regex_transformation.ppl (1)

25-29: LGTM! Rex query correctly demonstrates regex extraction.

The query properly extracts logType and filename from log.file.path using a regex pattern and computes the filename length. The transformation logic is sound and aligns with the test objectives.

integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl (1)

45-47: LGTM! Query correctly demonstrates LIKE pattern with aggregation.

The PPL query properly combines pattern matching with aggregation to count occurrences grouped by metrics.size.

integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (1)

8-9: LGTM! Appropriate static import added.

The import of assertYamlEqualsIgnoreId is correctly added to support YAML plan assertions in the new tests.

integ-test/src/test/resources/expectedOutput/calcite/dedup_metrics_size_field.yaml (1)

1-16: LGTM! Expected Calcite plan correctly models deduplication.

The plan properly shows:

  • Logical plan with ROW_NUMBER window function for deduplication by metrics.size
  • Physical plan with filter condition (<=($45, 1)) to keep only the first occurrence
  • Pushdown of EXISTS filter and field projections to OpenSearch

This validates the correct translation of PPL dedup operation to Calcite execution plan.

aalva500-prog and others added 7 commits December 18, 2025 14:42
…tePPLBig5IT.java

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
…tePPLBig5IT.java

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
…tePPLBig5IT.java

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
…tePPLBig5IT.java

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08477f7 and 237478a.

📒 Files selected for processing (1)
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (4)
integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (4)

8-9: LGTM!

The static import for assertYamlEqualsIgnoreId is properly used in the new test methods for plan validation.


60-70: Test resource files exist and are properly configured.

The test correctly references rex_regex_transformation.ppl and rex_regex_transformation.yaml from the integ-test resources directory. Both files are present in the repository, and the test follows guidelines for using valid test data from resources.


48-58: Test resource files verified and present.

The test correctly follows the pattern with proper JavaDoc, timing measurement, and Calcite plan assertion. Both required resource files exist: dedup_metrics_size_field.ppl and dedup_metrics_size_field.yaml.


72-80: The proposed fix cannot be applied as written - the yaml plan file doesn't exist.

This test correctly only performs timing without assertions, which differs from dedup_metrics_size_field and rex_regex_transformation. However, those tests have corresponding yaml plan files (dedup_metrics_size_field.yaml and rex_regex_transformation.yaml in expectedOutput/calcite/), whereas script_engine_like_pattern_with_aggregation.yaml is missing.

Before adding plan validation assertions, create the expected plan YAML file. Alternatively, if timing-only is intentional (e.g., for performance benchmarking without full plan validation), document that decision in a comment.

Likely an incorrect or invalid review comment.

@aalva500-prog aalva500-prog marked this pull request as draft December 19, 2025 23:21
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
@aalva500-prog aalva500-prog marked this pull request as ready for review December 23, 2025 20:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (2)

60-68: Consider adding plan assertions for consistency.

For consistency with rex_regex_transformation() and to properly test Calcite's SQL generation and optimization paths, consider adding plan validation assertions. This would verify that the generated Calcite plan matches expected behavior for LIKE pattern matching with aggregation.

🔎 Example refactor to add assertions
  @Test
  public void script_engine_like_pattern_with_aggregation() throws IOException {
    String ppl = sanitize(loadExpectedQuery("script_engine_like_pattern_with_aggregation.ppl"));
    timing(summary, "script_engine_like_pattern_with_aggregation", ppl);
+   String expected = loadExpectedPlan("big5/script_engine_like_pattern_with_aggregation.yaml");
+   assertYamlEqualsIgnoreId(expected, explainQueryYaml(ppl));
  }

Note: You would need to create the corresponding expected output YAML file at integ-test/src/test/resources/expectedOutput/calcite/big5/script_engine_like_pattern_with_aggregation.yaml.

Based on learnings, Calcite integration changes should test SQL generation and optimization paths.


70-78: Consider adding plan assertions for consistency.

Similar to the comment above, consider adding plan validation assertions to verify Calcite's handling of LIKE pattern matching with sorting. This would provide better test coverage and consistency with the rex_regex_transformation() test.

🔎 Example refactor to add assertions
  @Test
  public void script_engine_like_pattern_with_sort() throws IOException {
    String ppl = sanitize(loadExpectedQuery("script_engine_like_pattern_with_sort.ppl"));
    timing(summary, "script_engine_like_pattern_with_sort", ppl);
+   String expected = loadExpectedPlan("big5/script_engine_like_pattern_with_sort.yaml");
+   assertYamlEqualsIgnoreId(expected, explainQueryYaml(ppl));
  }

Note: You would need to create the corresponding expected output YAML file at integ-test/src/test/resources/expectedOutput/calcite/big5/script_engine_like_pattern_with_sort.yaml.

Based on learnings, Calcite integration changes should test SQL generation and optimization paths.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 237478a and edef39c.

📒 Files selected for processing (4)
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml
🧰 Additional context used
📓 Path-based instructions (6)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test new grammar rules with positive and negative cases for PPL parser changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Include edge cases and boundary conditions in PPL parser tests
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (4)
integ-test/src/test/resources/expectedOutput/calcite/big5/rex_regex_transformation.yaml (1)

1-14: LGTM! Expected plan structure is correct.

The YAML properly defines both logical and physical Calcite plans for the rex regex transformation query. The regex pattern correctly extracts logType and filename from the log path, and the physical plan includes appropriate pushdown optimizations (PROJECT, SORT, LIMIT).

integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (1)

48-58: LGTM! Test follows best practices.

The test properly validates Calcite's handling of rex regex transformations by loading the expected plan and asserting the actual generated plan matches. The JavaDoc clearly documents the test's purpose.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java (1)

738-745: The test expectations are correct and match the big5 test data. The big5.json file contains exactly three documents with @timestamp values (2023-01-02, 2023-03-01, 2023-05-01) that normalize to the expected 1-day spans. No action required.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLConditionBuiltinFunctionIT.java (1)

96-100: Row count of 3 is correct for the isnotnull(aws) filter.

The big5.json test data contains exactly 3 documents with a non-null aws object field (which contains nested aws.cloudwatch properties). All 3 will be returned by the query, confirming the expected verifyNumOfRows(actual, 3) assertion is accurate.

@aalva500-prog
Copy link
Contributor Author

@Swiddis the SQL CLI Integration tests are failing, any recommendations on how to fix this?

@LantaoJin LantaoJin added the testing Related to improving software testing label Dec 24, 2025
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you resolve the conflicts

aalva500-prog and others added 3 commits December 24, 2025 09:54
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (1)

70-78: Add plan assertions as previously requested.

The JavaDoc has been added, but the main issue from the previous review remains unaddressed: this test still only performs timing without validating the generated Calcite plan. As noted in the previous review comment (lines 70-78), add plan validation:

  @Test
  public void script_engine_like_pattern_with_sort() throws IOException {
    String ppl = sanitize(loadExpectedQuery("script_engine_like_pattern_with_sort.ppl"));
    timing(summary, "script_engine_like_pattern_with_sort", ppl);
+   String expected = loadExpectedPlan("big5/script_engine_like_pattern_with_sort.yaml");
+   assertYamlEqualsIgnoreId(expected, explainQueryYaml(ppl));
  }

Create integ-test/src/test/resources/expectedOutput/calcite/big5/script_engine_like_pattern_with_sort.yaml with the expected Calcite query plan.

Based on learnings, test SQL generation and optimization paths for Calcite integration changes.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dbb518 and 0bf2fb8.

📒 Files selected for processing (4)
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
  • integ-test/src/test/resources/big5/queries/rex_regex_transformation.ppl
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl
🚧 Files skipped from review as they are similar to previous changes (3)
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl
  • integ-test/src/test/resources/big5/queries/rex_regex_transformation.ppl
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Include edge cases and boundary conditions in PPL parser tests
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
🔇 Additional comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalcitePPLBig5IT.java (1)

48-58: Good implementation - JavaDoc and assertions added.

The JavaDoc has been added per previous review feedback, and the test now includes proper plan validation via assertYamlEqualsIgnoreId, which tests Calcite's SQL generation and optimization paths as expected for Calcite integration tests.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl (1)

1-35: Consider clarifying the comment block's purpose.

The comment block describes an OpenSearch DSL query structure with script engine details, while the PPL query below (lines 36-39) is much simpler. Consider adding a note at the beginning indicating this represents the expected compiled/translated OpenSearch query format for documentation/testing purposes, to avoid confusion about why both representations exist.

💡 Example clarification
-/* Filter messages containing 'sshd', sort by timestamp, and return top 10 results */
+/* Filter messages containing 'sshd', sort by timestamp, and return top 10 results
+ * 
+ * Expected compiled OpenSearch DSL query structure (for documentation/validation):
+ */
 /*
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bf2fb8 and d6079b8.

📒 Files selected for processing (3)
  • integ-test/src/test/resources/big5/queries/rex_regex_transformation.ppl
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl
🚧 Files skipped from review as they are similar to previous changes (2)
  • integ-test/src/test/resources/big5/queries/rex_regex_transformation.ppl
  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_aggregation.ppl
🧰 Additional context used
📓 Path-based instructions (1)
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (1)
integ-test/src/test/resources/big5/queries/script_engine_like_pattern_with_sort.ppl (1)

36-39: LGTM! The PPL query syntax and logic are correct:

  • Filters messages containing 'sshd' using the LIKE predicate
  • Sorts by timestamp in descending order
  • Returns top 10 results

Test data verification confirms the big5 dataset contains a message record with "sshd" in both the message text and process.name field. The data format is valid NDJSON with proper Elasticsearch bulk indexing structure, and includes realistic log metadata (timestamps, agent info, cloud region). The query will execute successfully and return meaningful test results.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Related to improving software testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants