Skip to content

Draft: Data loading optimizations#44

Closed
PatrickRMiles wants to merge 29 commits intoLBANN:mainfrom
PatrickRMiles:miles30/dataset_format_v2_loader_perf
Closed

Draft: Data loading optimizations#44
PatrickRMiles wants to merge 29 commits intoLBANN:mainfrom
PatrickRMiles:miles30/dataset_format_v2_loader_perf

Conversation

@PatrickRMiles
Copy link
Copy Markdown
Collaborator

@PatrickRMiles PatrickRMiles commented Apr 2, 2026

  • shift dataloader preprocessing work into dataset generation for speedup, maintaining
    support for old datasets
  • restore .contiguous() and dtype cast calls, but change order to avoid
    redundant copies
  • Make dataloader num_workers user-configurable

Based on #43

@PatrickRMiles PatrickRMiles changed the title Data loading optimizations Draft: Data loading optimizations Apr 2, 2026
PatrickRMiles and others added 18 commits April 2, 2026 13:37
…e clipping (LBANN#40)

* apply optimizer every batch, not every epoch; unscale gradients before clipping

* trainer tweaks

---------

Co-authored-by: Patrick Miles <miles30@tioga.llnl.gov>
…n trainer class (LBANN#43)

* apply optimizer every batch, not every epoch; unscale gradients before clipping

* trainer tweaks

* apply optimizer every batch, not every epoch; unscale gradients before clipping

* extract warmup to separate method; switch to warming up set number of batches (user configurable)

* whitespace; num_workers revert

* ruff

* make parallelstrategy, spatial_mesh, ddp_placements attrs of trainer; other small tweaks

* remove deprecated config attrs

* ruff

* get device mesh from ps class attr

* ruff

* missing self. on some ps accesses

* Fix imports and missing self.ps

* rm legacy warmup_epochs

* Move attributes to base class for clarity

* remove warmup_epochs -- not useful to keep support for this

* call cleanup_or_resume trainer method directly

* rm unused vars

---------

Co-authored-by: Patrick Miles <miles30@tioga.llnl.gov>
Co-authored-by: Michael McKinsey <michaelmckinsey1@gmail.com>
…trickRMiles/ScaFFold into miles30/dataset_format_v2_loader_perf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant