Skip to content

Comments

fix(exasol)!: fix parsing error in json_extract for exasol#7098

Open
nnamdi16 wants to merge 3 commits intotobymao:mainfrom
exasol:2-exasol-dialect-add-support-for-json_extract
Open

fix(exasol)!: fix parsing error in json_extract for exasol#7098
nnamdi16 wants to merge 3 commits intotobymao:mainfrom
exasol:2-exasol-dialect-add-support-for-json_extract

Conversation

@nnamdi16
Copy link
Contributor

What motivated this PR?

When parsing Exasol SQL that uses the JSON_EXTRACT function, it results in parsing errors.
See #6695


What was incorrect in the existing logic?

The Exasol dialect parser did not fully support the parsing of the JSON_EXTRACT function. As a result, valid Exasol SQL such as:

select json_extract('{"d":"a"}','$.d') emits(x  varchar(100))

fails during parsing.


How does this PR resolve the issue?

This PR fixes the Exasol JSON_EXTRACT parsing error by correctly handling the required EMITS clause and multiple JSON path arguments. It also updates the Exasol generator to render JSON_EXTRACT(... ) EMITS (...) properly by emitting all parsed JSON paths and printing the EMITS column definitions.


Documentation and semantic notes

  • Only affects the JSON_EXTRACT function in the Exasol dialect.
  • No breaking changes to existing functionality.
  • Improves correctness and compatibility when working with Exasol JSON queries in SQLGlot.

* Generate static documentation pages & update the CI

* Change CI so that changes are committed & pushed

* Make it so that only python v3.10 triggers doc generation

* Fix yaml syntax

* PR feedback, fix push action
# Conflicts:
#	.github/workflows/python-package.yml
),
# https://docs.exasol.com/db/latest/sql_references/functions/alphabeticallistfunctions/json_extract.htm
"JSON_EXTRACT": _build_extract_json(exp.JSONExtract),
"NOW": exp.CurrentTimestamp.from_arg_list,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do it like this, please add an entry in FUNCTION_PARSERS and get rid of the _parse_function override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how do we handle EMITS if we are not allowed to override _parse_function , which is required in using JSONExtract in exasol https://docs.exasol.com/db/latest/sql_references/functions/alphabeticallistfunctions/json_extract.htm

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will parse the JSON_EXTRACT using the default parser (i.e., leveraging the superclass' logic) and then consume the tokens needed to handle the EMIT syntax.

exp.CurrentSchema: lambda *_: "CURRENT_SCHEMA",
exp.DateDiff: _date_diff_sql,
exp.DateAdd: _add_date_sql,
exp.JSONExtract: _json_extract_sql,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a module-level function for this, add a method jsonextract_sql under Generator here.

column.set("table", None)
return column

def _parse_field(self, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnamdi16 this is still not correct, did you try adding a function parser as per my suggestion, earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did for JSON_EXTRACT but I had issue parsing the EMITS that was why I resulted in overriding _parse_field

Copy link
Collaborator

@georgesittas georgesittas Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied here: #7098 (comment). Is it still unclear? Can you please explain what you tried and didn't work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first added a custom FUNCTION_PARSER for JSON_EXTRACT and tried to match the EMITS clause, but it never matched because the parser was still positioned at the closing)when my method ran. I then tried consuming the closing parenthesis myself using self._match_r_paren(), which let me see EMITS, but this broke SQLGlot’s normal parsing flow since the parenthesis was consumed twice. This resulted in errors like:

sqlglot.errors.ParseError: Expecting ). Line 1, Col: 177.

I also tried calling super()._parse_function_call to reuse the default logic, but that caused parser state issues. @georgesittas

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see. I think we need a mechanism to make _match_r_paren() optional, at least for when there's a registered FUNCTION_PARSER that handles the consumption of ) to support cases like this one.

One idea is to check whether parser is truthy (i.e., we detected a function parser) and whether the _prev token is ). If both these conditions hold, you could do _match(R_PAREN) instead of _match_r_paren(), to make the match non-failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would look into it thanks for your suggestion @georgesittas

Comment on lines +1051 to +1052
path = [e.expression, *e.args.get("expressions", [])]
sql = self.func("JSON_EXTRACT", e.this, *path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path = [e.expression, *e.args.get("expressions", [])]
sql = self.func("JSON_EXTRACT", e.this, *path)
sql = self.func("JSON_EXTRACT", e.this, e.expression, *e.expressions)

expression = super()._parse_field(*args, **kwargs)
if isinstance(expression, exp.JSONExtract) and self._match_texts("EMITS"):
schema = self._parse_schema()
expression.set("columns", schema.expressions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you setting columns? This doesn't look like the right way to represent EMITS in the AST, columns is used for a different purpose today, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants