Skip to content

bugfix non existing latent state when fo=1#2368

Open
kctezcan wants to merge 1 commit into
ecmwf:jk/develop/diffusion-full-pipelinefrom
MeteoSwiss:ktezcan/diff_bugfix_fo1
Open

bugfix non existing latent state when fo=1#2368
kctezcan wants to merge 1 commit into
ecmwf:jk/develop/diffusion-full-pipelinefrom
MeteoSwiss:ktezcan/diff_bugfix_fo1

Conversation

@kctezcan
Copy link
Copy Markdown
Contributor

Description

when using forecast offset=1, we get the following error:

5:   File "/iopsstor/scratch/cscs/thunter/slurm/slurm_weathergen_d3pbludw_dir/WeatherGenerator/src/weathergen/train/trainer.py", line 410, in run
5:     self.validate_before_training()
5:   File "/iopsstor/scratch/cscs/thunter/slurm/slurm_weathergen_d3pbludw_dir/WeatherGenerator/src/weathergen/train/trainer.py", line 443, in validate_before_training
5:     self.validate(-1, self.validation_cfg, batch_size)
5:   File "/iopsstor/scratch/cscs/thunter/slurm/slurm_weathergen_d3pbludw_dir/WeatherGenerator/src/weathergen/train/trainer.py", line 675, in validate
5:     _ = self.loss_calculator_val.compute_loss(
5:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5:   File "/iopsstor/scratch/cscs/thunter/slurm/slurm_weathergen_d3pbludw_dir/WeatherGenerator/src/weathergen/train/loss_calculator.py", line 93, in compute_loss
5:     loss_values = calculator.compute_loss(
5:                   ^^^^^^^^^^^^^^^^^^^^^^^^
5:   File "/iopsstor/scratch/cscs/thunter/slurm/slurm_weathergen_d3pbludw_dir/WeatherGenerator/src/weathergen/train/loss_modules/loss_module_latent_diffusion.py", line 97, in compute_loss
5:     pred_tokens_all = [pl["latent_state"].z_pre_norm for pl in preds.latent if pl]
5:                        ~~^^^^^^^^^^^^^^^^
5: KeyError: 'latent_state'

This is due to the fact that when forecast offset=1, we have the first entry in the preds.latent comes from the source but does not have any latent_state field since this is not encoded, but it has the posterior field inside. Instead of checking if preds.latent has any fields, we should check if it has the necessary field, i.e. latent_state.

Issue Number

No issue

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant