Skip to content

Cooldown masking rate#2351

Draft
yperugachidiaz wants to merge 4 commits into
ecmwf:develop-sslfrom
yperugachidiaz:ypd/dev/2328-develop-ssl-cooldown-masking-rate
Draft

Cooldown masking rate#2351
yperugachidiaz wants to merge 4 commits into
ecmwf:develop-sslfrom
yperugachidiaz:ypd/dev/2328-develop-ssl-cooldown-masking-rate

Conversation

@yperugachidiaz
Copy link
Copy Markdown
Contributor

@yperugachidiaz yperugachidiaz commented May 12, 2026

Description

Branched off from develop-ssl.

  • Added train_step and total_train_steps in masking.py to support step-dependent cooldown of masking schedule
  • Therefore, propagated and computed the required training step information in:
    -- multi_stream_data_sampler.py
    -- tokenizer_masking.py
    -- trainer.py
  • Note: cooldown of masking is not used when rate_sampling is enabled.

In the HedgeDoc an experimental setup can be found for testing the cooldown of the masking rate. Work in progress.

Issue Number

Closes #2328

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@github-actions github-actions Bot added eval anything related to the model evaluation pipeline infra Issues related to infrastructure model Related to model training or definition (not generic infra) labels May 12, 2026
@yperugachidiaz yperugachidiaz changed the title Ypd/dev/2328 develop ssl cooldown masking rate Cooldown masking rate May 12, 2026
@yperugachidiaz yperugachidiaz changed the base branch from develop to develop-ssl May 12, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval anything related to the model evaluation pipeline infra Issues related to infrastructure model Related to model training or definition (not generic infra)

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Cooldown masking rate

1 participant