-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Description
Context
The sync _run_batch() path produces high-fidelity per-column progress logs (worker count, records/sec, ETA, emoji progression). The async AsyncTaskScheduler path only logs start/end and salvage rounds - no per-column progress during the main generation phase.
Example sync output:
⚡️ Processing llm-text column 'recipe_idea' with 4 concurrent workers
|-- 🐴 llm-text column 'recipe_idea' progress: 1/3 (33%) complete, 1 ok, 0 failed, 1.54 rec/s, eta 1.3s
|-- 🚗 llm-text column 'recipe_idea' progress: 2/3 (67%) complete, 2 ok, 0 failed, 3.05 rec/s, eta 0.3s
Async output during generation is silent.
Proposed approach
Reuse the existing ProgressTracker class:
- Initialize
dict[str, ProgressTracker]inAsyncTaskScheduler.__init__() - Wire
record_success()/record_failure()in_execute_task_inner()after task completion - Emit
log_final()when columns complete - ~100 lines, can be done incrementally (basic counts first, emoji/ETA polish later)
Related
- PR feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder #429 (async builder integration)
- Reported by @nabinchha in PR #429 comment
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels