Skip to content

[CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect#4933

Open
xuzifu666 wants to merge 10 commits into
apache:mainfrom
xuzifu666:distinct_on_support
Open

[CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect#4933
xuzifu666 wants to merge 10 commits into
apache:mainfrom
xuzifu666:distinct_on_support

Conversation

@xuzifu666
Copy link
Copy Markdown
Member

@xuzifu666 xuzifu666 commented May 11, 2026

@xuzifu666 xuzifu666 force-pushed the distinct_on_support branch from a3d151d to 99a667c Compare May 11, 2026 13:02
@xuzifu666 xuzifu666 force-pushed the distinct_on_support branch from 99a667c to a5629e3 Compare May 12, 2026 02:39
@xuzifu666 xuzifu666 changed the title [CALCITE-7517] Support DISTINCT ON clause in SELECT statements [CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect May 12, 2026
@cjj2010
Copy link
Copy Markdown
Contributor

cjj2010 commented May 12, 2026

"DISTINCT ON" is a unique syntax of PGSQL. Would it be more appropriate to implement the parsing support for this function on the babel parser

@xuzifu666
Copy link
Copy Markdown
Member Author

xuzifu666 commented May 12, 2026

"DISTINCT ON" is a unique syntax of PGSQL. Would it be more appropriate to implement the parsing support for this function on the babel parser

  1. Babel's only extensibility is:

Configuring additional keywords, join types, and binary operators in config.fmpp;

Adding custom parser methods (such as PostgreSQL's BEGIN/COMMIT) to includes/*.ftl.

Babel itself does not have a separate syntax file; all SELECT-related syntax must be defined in coreParser.jj.

  1. Consistent with QUALIFY Implementation

QUALIFY is a syntax specific to Snowflake, which Calcite implements in the core:

The <QUALIFY> token is defined in core.Parser.jj.

The SqlSelect class adds a qualify field in the core.

The Validator/SqlToRelConverter logic is in the core; DISTINCT ON uses the exact same pattern.

  1. Conformance Controls Semantics, Not Syntax

The current Parser unconditionally parses DISTINCT ON, but semantic validation is controlled by SqlConformance:

// Using `LENIENT conformance` in testing
`fixture().withConformance(SqlConformanceEnum.LENIENT)`

This is consistent with the handling of other extensions such as QUALIFY, LATERAL, and TABLESAMPLE: the Parser is as lenient as possible, and the Validator decides whether to allow based on the dialect.

so in my view:The current implementation conforms to Calcite's architectural conventions and does not require migration to Babel. For adjustments, finer-grained switches can be added to SqlConformance to control the availability of DISTINCT ON.

By the way: Besides PG, ClickHouse also supports DISTINCT ON.

@xiedeyantu
Copy link
Copy Markdown
Member

I attempted to verify this across several databases (though my testing may not have been exhaustive) and indeed found that only PostgreSQL supports this syntax; furthermore, DISTINCT ON does not appear to be part of the standard SQL specification. I would suggest implementing this within Babel (specifically, by adding a configuration parameter to control it); we can also wait to see if anyone else has any better suggestions.

@xuzifu666
Copy link
Copy Markdown
Member Author

I attempted to verify this across several databases (though my testing may not have been exhaustive) and indeed found that only PostgreSQL supports this syntax; furthermore, DISTINCT ON does not appear to be part of the standard SQL specification. I would suggest implementing this within Babel (specifically, by adding a configuration parameter to control it); we can also wait to see if anyone else has any better suggestions.

Good suggestion, I had add a config to control it in babel.

Comment thread core/src/main/codegen/config.fmpp Outdated
"parserImpls.ftl"
]

includeDistinctOn: true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there’s a better place for this parameter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to add a test case similar to "SELECT DISTINCT ON (deptno)" here? I tried executing "SELECT DISTINCT ON (deptno)" and it didn't generate any error

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I had added related test cases in SqlParserTest.java.

Copy link
Copy Markdown
Contributor

@cjj2010 cjj2010 May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's a problem with my input method. I intended to add"SELECT DISTINCTON (deptno) empno" and "SELECT DISTINCT ON(deptno) empno". Scenarios like this, where text is written consecutively, should normally result in an error. However, the outcomes of the two situations I executed did not meet my expectations.
https://www.postgresql.org/docs/current/queries-select-lists.html#QUERIES-DISTINCT

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, distinct on needs to be followed by an order by statement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, distinct on needs to be followed by an order by statement.

Perhaps I should elaborate and add test cases similar to "SELECT DISTINCTON (deptno) empno, ename FROM emp ORDER BY deptno, empno"

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants