-
Notifications
You must be signed in to change notification settings - Fork 85
Closed
Description
Context
When DATA_DESIGNER_ASYNC_ENGINE=1 is set, build() correctly routes through _build_async() and the AsyncTaskScheduler. However, build_preview() still goes through the sequential _run_batch() path - columns are processed one at a time, waiting for all records to complete before starting the next column.
For a recipe pipeline with 7 columns and 3 records, preview took ~52s sequentially. Independent columns (e.g., two recipe_idea columns on different providers) could run concurrently, and downstream columns could start as soon as their per-row dependencies are met.
Proposed approach
Reuse _build_async() with preview-specific behavior:
- Single row group, no disk checkpoints
- Return in-memory DataFrame instead of writing to disk
- Skip metadata writes
~50-100 lines, mostly conditional logic around checkpointing. The async scheduler itself needs no changes.
Related
- PR feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder #429 (async builder integration)
- Reported by @nabinchha in PR #429 comment
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels