feat: add processor plugin system#299
Draft
andreatgretel wants to merge 2 commits intoandreatgretel/feat/processor-pluginsfrom
Draft
feat: add processor plugin system#299andreatgretel wants to merge 2 commits intoandreatgretel/feat/processor-pluginsfrom
andreatgretel wants to merge 2 commits intoandreatgretel/feat/processor-pluginsfrom
Conversation
Replace stage parameter with callback methods (preprocess, process_after_batch, postprocess). The builder now invokes these callbacks at appropriate stages: PRE_GENERATION, POST_BATCH, and POST_GENERATION. - Remove build_stage from ProcessorConfig - Add callback methods to Processor base class - Update DropColumns and SchemaTransform to use process_after_batch - Simplify ColumnWiseBuilder processor invocation
Adds support for third-party processor plugins via plugin discovery: - PluginType.PROCESSOR for external processor plugins - ProcessorRegistry discovers and loads processor plugins - processor_types.py with plugin-injected type union - PluginRegistry uses RLock for nested imports Demo processors: - RegexFilterProcessor (preprocess stage) - SemanticDedupProcessor (postprocess stage)
a1323f9 to
46880e7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for third-party processor plugins via the existing plugin discovery mechanism. This PR builds on top of #294 (callback-based processors).
Changes
Plugin Infrastructure
PluginType.PROCESSORfor external processor pluginsProcessorRegistrydiscovers and loads processor pluginsprocessor_types.pywith plugin-injected type unionPluginRegistryusesRLockinstead ofLockfor nested importsDemo Processors
RegexFilterProcessor- filters rows based on regex patterns (preprocess stage)SemanticDedupProcessor- removes duplicate content via embeddings (postprocess stage)Documentation
Depends On
Test Plan