Skip to content

feat: wire build_preview() into async scheduler when DATA_DESIGNER_ASYNC_ENGINE=1 #442

@andreatgretel

Description

@andreatgretel

Context

When DATA_DESIGNER_ASYNC_ENGINE=1 is set, build() correctly routes through _build_async() and the AsyncTaskScheduler. However, build_preview() still goes through the sequential _run_batch() path - columns are processed one at a time, waiting for all records to complete before starting the next column.

For a recipe pipeline with 7 columns and 3 records, preview took ~52s sequentially. Independent columns (e.g., two recipe_idea columns on different providers) could run concurrently, and downstream columns could start as soon as their per-row dependencies are met.

Proposed approach

Reuse _build_async() with preview-specific behavior:

  • Single row group, no disk checkpoints
  • Return in-memory DataFrame instead of writing to disk
  • Skip metadata writes

~50-100 lines, mostly conditional logic around checkpointing. The async scheduler itself needs no changes.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions