feat: add allow_resize for 1:N and N:1 generation patterns#286
Draft
andreatgretel wants to merge 3 commits intomainfrom
Draft
feat: add allow_resize for 1:N and N:1 generation patterns#286andreatgretel wants to merge 3 commits intomainfrom
andreatgretel wants to merge 3 commits intomainfrom
Conversation
3c9fa49 to
8ba264c
Compare
bd474e3 to
0cedd56
Compare
Adds support for generators that produce a different number of records than the input (expansion or retraction). This addresses GitHub issue #265. Changes: - Add `allow_resize` parameter to `update_records()` in DatasetBatchManager - Add `allow_resize` field to CustomColumnConfig - Add validation requiring FULL_COLUMN strategy when allow_resize=True - Track and report actual_num_records in metadata (may differ from target) - Add logging when batch size changes - Add example_allow_resize.py demonstrating the feature - Add comprehensive tests
0cedd56 to
8c9a33e
Compare
- Fix example_allow_resize.py: remove non-existent CustomColumnContext, simplify to 1-arg signatures - Add allow_resize logging in CustomColumnGenerator.log_pre_generation
johnnygreco
reviewed
Feb 5, 2026
| default=None, | ||
| description="Optional typed configuration object passed as second argument to generator function", | ||
| ) | ||
| allow_resize: bool = Field( |
Contributor
There was a problem hiding this comment.
I'm wondering if we should elevate this to a property on the base column config (default is False), which you can override in custom columns and plugins.
Contributor
Author
There was a problem hiding this comment.
Yeah that was the initial solution, then I ended up doing a mixin instead.
Thought it was a bit opaque for plugins specifically, that they developer to find out about a specific attribute/property 🤔 But it makes things simpler I suppose?
Contributor
There was a problem hiding this comment.
It's a pattern we already use for custom emojis. Also the required_columns and side_effect_columns (these ones have to be set, though).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
allow_resizeparameter toCustomColumnConfigenabling custom column generators to produce a different number of records than the input. This supports 1:N expansion (e.g., generating multiple variations per input) and N:1 retraction (e.g., filtering or aggregating records) patterns. Addresses #265.Changes
Added
allow_resizefield onCustomColumnConfigwith validation requiringfull_columnstrategyallow_resizeparameter toupdate_records()inDatasetBatchManageractual_num_recordstracking in dataset metadata (may differ fromtarget_num_recordswhen resizing)example_allow_resize.pydemonstrating expansion (1:N) and retraction (N:1) patternsChanged
column_wise_builder.py- logs resize operations, passesallow_resizeto batch managerCustomColumnGenerator.log_pre_generation()- logsallow_resizewhen enabledAttention Areas
dataset_batch_manager.py- Core change to buffer handling with newallow_resizeparametercolumn_configs.py- New field and validation logic onCustomColumnConfigcolumn_wise_builder.py- Engine-level handling for resize operationsDescription updated with AI