[SPARK-56961][SQL] Pass all options while loading changelog#56044
[SPARK-56961][SQL] Pass all options while loading changelog#56044aokolnychyi wants to merge 1 commit into
Conversation
| * | ||
| * @since 4.2.0 | ||
| */ | ||
| @Evolving | ||
| public class ChangelogInfo { |
There was a problem hiding this comment.
I think we should rename this given that we will have to pass options to loadTable as well. Right now, we already have TableInfo that we use for CREATE table cases. We will NOT be able to use TableInfo for loading tables. So ChangelogInfo conflicts with TableInfo as it is used for loading, not creation.
There was a problem hiding this comment.
Another alternative name to consider is ChangelogParameters but context seems a better fit.
There was a problem hiding this comment.
Thoughts, @gengliangwang @cloud-fan @johanl-db?
There was a problem hiding this comment.
My proposal would be:
loadTable(ident, tableContext (timeTravelSpec, writePrivileges), options)
- by default delegates to existing loadTable() methods
loadChangelog(ident, changelogContext (range, deduplicationMode, etc), options)
There was a problem hiding this comment.
We concluded that passing all options to load methods for tables and changelog is a requirement for external connectors like Delta and Iceberg.
|
cc @huaxingao, this is one of the items we discussed on the dev list for 4.2 |
| * @param ident a table identifier | ||
| * @param changelogInfo the CDC query parameters (range, deduplication mode, etc.) | ||
| * @param context the CDC query context (range, deduplication mode, etc.) | ||
| * @param options all options passed to the changelog query |
There was a problem hiding this comment.
Worth noting the overlap with context: CDC-recognized keys (range, deduplication mode, etc.) appear in options as well as in the parsed context. Connectors that validate against an allowlist of unknown keys may otherwise be surprised.
| * @param options all options passed to the changelog query | |
| * @param options all options passed to the changelog query, including the CDC-recognized | |
| * keys (range, deduplication mode, etc.) that are also parsed into {@code context} |
| * | ||
| * @param relation the table relation (typically an [[UnresolvedRelation]]) | ||
| * @param changelogInfo the CDC query parameters (range, deduplication mode, etc.) | ||
| * @param changelogContext the CDC query parameters (range, deduplication mode, etc.) |
There was a problem hiding this comment.
Minor: the type is now ChangelogContext, so "context" matches better than "parameters".
| * @param changelogContext the CDC query parameters (range, deduplication mode, etc.) | |
| * @param changelogContext the CDC query context (range, deduplication mode, etc.) |
There was a problem hiding this comment.
I don't have strong opinion on this one.
What changes were proposed in this pull request?
This PR passes all specified options while loading changelogs.
Why are the changes needed?
These changes are needed to make the API usable in connectors like Iceberg and Delta.
Does this PR introduce any user-facing change?
The functionality hasn't been released yet.
How was this patch tested?
This PR comes with tests.
Was this patch authored or co-authored using generative AI tooling?
Claude Code v2.1.145.