Flink: Allow custom writer factory in DynamicIcebergSink#16339
Open
sqd wants to merge 1 commit into
Open
Conversation
Introduce DynamicTaskWriterFactoryProvider so callers can supply a
custom TaskWriterFactory<RowData> in place of the default
RowDataTaskWriterFactory, while reusing the surrounding table,
schema, partition spec, and write-property resolution already done
in DynamicWriter.
The primary motivation is throughput. Our pipelines have a data
pattern tied deeply into business logic that a hand-rolled
TaskWriter can exploit to produce files far faster than the
generic RowDataTaskWriterFactory.
Making the factory pluggable also enables other use cases without
forking the sink:
- Row-level or file-level audit and metrics: sampling, lineage
stamps, metric counters layered around the writer.
- Custom file naming and layout: custom prefixes, alternative
partition paths, custom filesystem properties such as storage
class and permissions.
The default provider preserves existing behavior, so callers that
do not supply one are unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce DynamicTaskWriterFactoryProvider so callers can supply a custom TaskWriterFactory in place of the default RowDataTaskWriterFactory, while reusing the surrounding table, schema, partition spec, and write-property resolution already done in DynamicWriter.
The primary motivation is throughput. Our pipelines have a data pattern tied deeply into business logic that a hand-rolled TaskWriter can exploit to produce files far faster than the generic RowDataTaskWriterFactory.
Making the factory pluggable also enables other use cases without forking the sink:
The default provider preserves existing behavior, so callers that do not supply one are unaffected.