Skip to content

[SPARK-56961][SQL] Pass all options while loading changelog#56044

Open
aokolnychyi wants to merge 1 commit into
apache:masterfrom
aokolnychyi:spark-56961
Open

[SPARK-56961][SQL] Pass all options while loading changelog#56044
aokolnychyi wants to merge 1 commit into
apache:masterfrom
aokolnychyi:spark-56961

Conversation

@aokolnychyi
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR passes all specified options while loading changelogs.

Why are the changes needed?

These changes are needed to make the API usable in connectors like Iceberg and Delta.

Does this PR introduce any user-facing change?

The functionality hasn't been released yet.

How was this patch tested?

This PR comes with tests.

Was this patch authored or co-authored using generative AI tooling?

Claude Code v2.1.145.

*
* @since 4.2.0
*/
@Evolving
public class ChangelogInfo {
Copy link
Copy Markdown
Contributor Author

@aokolnychyi aokolnychyi May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename this given that we will have to pass options to loadTable as well. Right now, we already have TableInfo that we use for CREATE table cases. We will NOT be able to use TableInfo for loading tables. So ChangelogInfo conflicts with TableInfo as it is used for loading, not creation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative name to consider is ChangelogParameters but context seems a better fit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@aokolnychyi aokolnychyi May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal would be:

loadTable(ident, tableContext (timeTravelSpec, writePrivileges), options)
 - by default delegates to existing loadTable() methods
loadChangelog(ident, changelogContext (range, deduplicationMode, etc), options)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We concluded that passing all options to load methods for tables and changelog is a requirement for external connectors like Delta and Iceberg.

@aokolnychyi
Copy link
Copy Markdown
Contributor Author

cc @huaxingao, this is one of the items we discussed on the dev list for 4.2

* @param ident a table identifier
* @param changelogInfo the CDC query parameters (range, deduplication mode, etc.)
* @param context the CDC query context (range, deduplication mode, etc.)
* @param options all options passed to the changelog query
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting the overlap with context: CDC-recognized keys (range, deduplication mode, etc.) appear in options as well as in the parsed context. Connectors that validate against an allowlist of unknown keys may otherwise be surprised.

Suggested change
* @param options all options passed to the changelog query
* @param options all options passed to the changelog query, including the CDC-recognized
* keys (range, deduplication mode, etc.) that are also parsed into {@code context}

*
* @param relation the table relation (typically an [[UnresolvedRelation]])
* @param changelogInfo the CDC query parameters (range, deduplication mode, etc.)
* @param changelogContext the CDC query parameters (range, deduplication mode, etc.)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the type is now ChangelogContext, so "context" matches better than "parameters".

Suggested change
* @param changelogContext the CDC query parameters (range, deduplication mode, etc.)
* @param changelogContext the CDC query context (range, deduplication mode, etc.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinion on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants