Skip to content

feat: add processor plugin system#299

Draft
andreatgretel wants to merge 2 commits intoandreatgretel/feat/processor-pluginsfrom
andreatgretel/feat/processor-plugins-registry
Draft

feat: add processor plugin system#299
andreatgretel wants to merge 2 commits intoandreatgretel/feat/processor-pluginsfrom
andreatgretel/feat/processor-plugins-registry

Conversation

@andreatgretel
Copy link
Contributor

Summary

Adds support for third-party processor plugins via the existing plugin discovery mechanism. This PR builds on top of #294 (callback-based processors).

Changes

Plugin Infrastructure

  • PluginType.PROCESSOR for external processor plugins
  • ProcessorRegistry discovers and loads processor plugins
  • processor_types.py with plugin-injected type union
  • PluginRegistry uses RLock instead of Lock for nested imports

Demo Processors

  • RegexFilterProcessor - filters rows based on regex patterns (preprocess stage)
  • SemanticDedupProcessor - removes duplicate content via embeddings (postprocess stage)

Documentation

  • Plugin overview and development guide

Depends On

Test Plan

  • Demo processors work when installed as plugins
  • Plugin discovery correctly registers external processors
  • Type hints include plugin processor configs

Replace stage parameter with callback methods (preprocess, process_after_batch,
postprocess). The builder now invokes these callbacks at appropriate stages:
PRE_GENERATION, POST_BATCH, and POST_GENERATION.

- Remove build_stage from ProcessorConfig
- Add callback methods to Processor base class
- Update DropColumns and SchemaTransform to use process_after_batch
- Simplify ColumnWiseBuilder processor invocation
Adds support for third-party processor plugins via plugin discovery:

- PluginType.PROCESSOR for external processor plugins
- ProcessorRegistry discovers and loads processor plugins
- processor_types.py with plugin-injected type union
- PluginRegistry uses RLock for nested imports

Demo processors:
- RegexFilterProcessor (preprocess stage)
- SemanticDedupProcessor (postprocess stage)
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/processor-plugins branch 3 times, most recently from a1323f9 to 46880e7 Compare February 5, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant