feat(Clickhouse)!: Add support for multiple-suffix combined aggregate functions in Clickhouse dialect by emanb29 · Pull Request #7109 · tobymao/sqlglot

emanb29 · 2026-02-19T19:34:25Z

Add support for additional aggregate functions (motivated by countIfMerge -- happy to open a bug issue if necessary).

Note: while implementing this I discovered that not all "standard" clickhouse aggregators are named in this parser. Not sure how it'd be possible to do so given that list changes as clickhouse evolves, but ¯_(ツ)_/¯ I decided to omit handling those from the scope of this PR (as far as I'm aware, the only way to know for sure which aggregators are available is by querying system tables at runtime). I've sort of self-justified this as "looks like all the canonical names are supported, and few if any of the aliases"

There is still one deviation from clickhouse 24.12.1.1332 (the version I happen to have installed): when a function name collides with a duplicated combinator (eg, minMergeMerge), the dialect in this PR accepts it (as minMerge + [Merge]), but the real clickhouse rejects it. However, I think this is actually a clickhouse bug, and it's not blocking my usecase, so I've left it as-is.

…house dialect. Note: while implementing this I discovered that not all "standard" clickhouse aggregators are named in this parser. Not sure how it'd be possible to do so given that list changes as clickhouse evolves, but ¯\_(ツ)_/¯ One deviation from clickhouse `24.12.1.1332` (the version I happen to have installed): when a function name collides with a duplicated combinator (eg, `minMergeMerge`), the dialect accepts it (as minMerge + [Merge]), but the real clickhouse rejects it. However, I think this is actually a clickhouse bug so I'm leaving it as-is.

emanb29 · 2026-02-19T19:36:38Z

tests/dialects/test_clickhouse.py

            0
        ].assert_is(exp.ParameterizedAgg)

+    def test_agg_functions_multiple_suffixes(self):


I recommend starting from here as you review -- this outlines the new behaviors

emanb29 · 2026-02-19T19:41:40Z

tests/dialects/test_clickhouse.py

+        # Regression test: single-suffix
+        self.validate_identity("SELECT uniqExactIf(x, y) FROM t").selects[0].assert_is(
+            exp.CombinedAggFunc
+        )


I decided against adding any 0-suffix tests because those are already covered by eg approx_top_sum tests above

emanb29 · 2026-02-19T19:46:38Z

sqlglot/dialects/clickhouse.py

+            # Until we are able to identify a 1- or 0-suffix aggregate function by name,
+            # repeatedly strip and stack suffixes (longer suffixes first, see comment on
+            # AGG_FUNCTIONS_SUFFIXES_SORTED). This loop only runs for 2 or more suffixes,
+            # as AGG_FUNC_MAPPING memoizes all 0- and 1-suffix
+            accumulated_suffix_stack: t.List[str] = []
+            while (parts := cls.AGG_FUNC_MAPPING.get(name)) is None:


Note that this "repeated stripping" only kicks in on the previously-unhandled cases. All 0- or 1-suffix aggregators (which were already supported via the memoized AGG_FUNC_MAPPING) continue to be implemented the same way. There's a tradeoff judgement here between "how big to make AGG_FUNC_MAPPING" vs "how fast to make _resolve_clickhouse_agg" -- Exponential space usage scares me, so I decided not to further extend the memoization introduced in #2734 (comment)

I see the point. The current approach looks very costly (we should optimize). Can you benchmark for combinators that need a lot of stripping?

Is benchmarking merited for this? The suffix stacks we're talking about are almost never going to be more than 2 -- you can contrive a 3- or 4-suffix aggregator (as I did in a unit test), but this is O(n*k) where n is "probably 2, probably not >4" and k is 15.

emanb29 · 2026-02-19T19:48:19Z

sqlglot/dialects/clickhouse.py

                    "expressions": anon_func.expressions,
                }
-                if parts[1]:
+                if len(parts[1]) > 0:


Honestly, kind of a shame that these suffixes aren't injected into the AST anywhere, but again, seemed out of scope for this PR. Ref: #4814

geooo109 · 2026-02-20T14:59:34Z

sqlglot/dialects/clickhouse.py

+            last_seen_suffix = None
+            for suffix in reversed(accumulated_suffix_stack):
+                if suffix == last_seen_suffix:
+                    return None  # reject: there are adjacent duplicate combinators
+                suffixes_list.append(suffix)
+                last_seen_suffix = suffix


If we assume that the input is always correct (so we wont' have something like ssumMergeMerge). We just need to reverse here and make a list right ?

SQLGlot always assumes that the input is always correct.

Correct. We could even use a dequeue and avoid the reverse, but I didn't want to overengineer too aggressively for an initial PR. I'm not familiar with the "assume the input is always correct" philosophy of SQLGlot, but I modelled this behavior after the parse_one("foobar(x)").assert_is(exp.Anonymous) assertion. As verified by one of the new unit tests, this doesn't fail parsing, it just causes the expression to parse as an exp.Anonymous. Would it be preferable to parse this as some variant of AggregateFunction? If so, should the foobar(x) case also be handled similarly?

Notwithstanding the above (just aiming to ensure all the information I based this implementation on is accounted for in a final decision), I'm happy to follow whatever semantic guidance existing contributors find appropriate,

geooo109 · 2026-02-20T15:32:34Z

sqlglot/dialects/clickhouse.py

+            # Until we are able to identify a 1- or 0-suffix aggregate function by name,
+            # repeatedly strip and stack suffixes (longer suffixes first, see comment on
+            # AGG_FUNCTIONS_SUFFIXES_SORTED). This loop only runs for 2 or more suffixes,
+            # as AGG_FUNC_MAPPING memoizes all 0- and 1-suffix
+            accumulated_suffix_stack: t.List[str] = []
+            while (parts := cls.AGG_FUNC_MAPPING.get(name)) is None:


I see the point. The current approach looks very costly (we should optimize). Can you benchmark for combinators that need a lot of stripping?

emanb29 commented Feb 19, 2026

View reviewed changes

mtraversarosasia-14 approved these changes Feb 19, 2026

View reviewed changes

emanb29 commented Feb 19, 2026

View reviewed changes

emanb29 changed the title ~~Add support for multiple-suffix combined aggregate functions in Clickhouse dialect~~ feat(Clickhouse)!: Add support for multiple-suffix combined aggregate functions in Clickhouse dialect Feb 19, 2026

emanb29 commented Feb 19, 2026

View reviewed changes

geooo109 reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(Clickhouse)!: Add support for multiple-suffix combined aggregate functions in Clickhouse dialect#7109

feat(Clickhouse)!: Add support for multiple-suffix combined aggregate functions in Clickhouse dialect#7109
emanb29 wants to merge 1 commit intotobymao:mainfrom
hydrolix:bugfix/clickhouse-recognize-aggregate-funcs

emanb29 commented Feb 19, 2026

Uh oh!

emanb29 Feb 19, 2026

Uh oh!

emanb29 Feb 19, 2026

Uh oh!

emanb29 Feb 19, 2026

Uh oh!

geooo109 Feb 20, 2026

Uh oh!

emanb29 Feb 20, 2026

Uh oh!

emanb29 Feb 19, 2026

Uh oh!

geooo109 Feb 20, 2026

Uh oh!

emanb29 Feb 20, 2026

Uh oh!

geooo109 Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

emanb29 commented Feb 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants