Skip to content

feat: add allow_resize for 1:N and N:1 generation patterns#286

Draft
andreatgretel wants to merge 3 commits intomainfrom
andreatgretel/feat/allow-resize
Draft

feat: add allow_resize for 1:N and N:1 generation patterns#286
andreatgretel wants to merge 3 commits intomainfrom
andreatgretel/feat/allow-resize

Conversation

@andreatgretel
Copy link
Contributor

@andreatgretel andreatgretel commented Feb 3, 2026

Summary

Adds allow_resize parameter to CustomColumnConfig enabling custom column generators to produce a different number of records than the input. This supports 1:N expansion (e.g., generating multiple variations per input) and N:1 retraction (e.g., filtering or aggregating records) patterns. Addresses #265.

Changes

Added

  • allow_resize field on CustomColumnConfig with validation requiring full_column strategy
  • allow_resize parameter to update_records() in DatasetBatchManager
  • actual_num_records tracking in dataset metadata (may differ from target_num_records when resizing)
  • Informative logging when batch size changes during generation
  • example_allow_resize.py demonstrating expansion (1:N) and retraction (N:1) patterns
  • Documentation for the feature with examples
  • Comprehensive tests for config validation, expansion, retraction, and metadata tracking

Changed

  • column_wise_builder.py - logs resize operations, passes allow_resize to batch manager
  • CustomColumnGenerator.log_pre_generation() - logs allow_resize when enabled

Attention Areas

Reviewers: Please pay special attention to the following:


Description updated with AI

@andreatgretel andreatgretel force-pushed the andreatgretel/feat/custom-column branch 3 times, most recently from 3c9fa49 to 8ba264c Compare February 3, 2026 20:05
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/allow-resize branch from bd474e3 to 0cedd56 Compare February 3, 2026 22:29
Adds support for generators that produce a different number of records
than the input (expansion or retraction). This addresses GitHub issue #265.

Changes:
- Add `allow_resize` parameter to `update_records()` in DatasetBatchManager
- Add `allow_resize` field to CustomColumnConfig
- Add validation requiring FULL_COLUMN strategy when allow_resize=True
- Track and report actual_num_records in metadata (may differ from target)
- Add logging when batch size changes
- Add example_allow_resize.py demonstrating the feature
- Add comprehensive tests
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/allow-resize branch from 0cedd56 to 8c9a33e Compare February 3, 2026 22:33
@andreatgretel andreatgretel changed the base branch from andreatgretel/feat/custom-column to main February 3, 2026 22:35
- Fix example_allow_resize.py: remove non-existent CustomColumnContext,
  simplify to 1-arg signatures
- Add allow_resize logging in CustomColumnGenerator.log_pre_generation
default=None,
description="Optional typed configuration object passed as second argument to generator function",
)
allow_resize: bool = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should elevate this to a property on the base column config (default is False), which you can override in custom columns and plugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was the initial solution, then I ended up doing a mixin instead.
Thought it was a bit opaque for plugins specifically, that they developer to find out about a specific attribute/property 🤔 But it makes things simpler I suppose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pattern we already use for custom emojis. Also the required_columns and side_effect_columns (these ones have to be set, though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants