Skip to content

Conversation

@JMP-MO
Copy link
Collaborator

@JMP-MO JMP-MO commented Sep 15, 2025

Changes:

  • I have added an ML tutorial notebook for MO UKV dataset accessed via the PyEarthTools met office site archive package which can be used to directly pull data on disk and process it for use within pyearthtools pipelines. The notebook demonstrates how to create iterable pyearthtools dataset pipelines and connect these to a custom PyTorch ML project.
  • I also moved the Met Office specific notebooks to the Met Office site archive folder as they won't work without the site archive and are specific to it.
  • I updated the accessor code for the Met Office datasets after having some early issues and added some additional helper functions for processing these relating to interpolating the grids.

Environment:

I used the PET Tutorials env.

Testing:

Find the ML notebook in the Met Office site archive package in the new notebooks folder. You should be able to run the notebook with the PET Tutorials env. You can choose to deactivate training and use the include weights if you would like.

@JMP-MO JMP-MO self-assigned this Sep 15, 2025
@JMP-MO JMP-MO added the documentation Improvements or additions to documentation label Sep 15, 2025
@coveralls
Copy link

coveralls commented Sep 15, 2025

Pull Request Test Coverage Report for Build 17853138883

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 378 unchanged lines in 14 files lost coverage.
  • Overall coverage increased (+0.2%) to 61.168%

Files with Coverage Reduction New Missed Lines %
packages/data/src/pyearthtools/data/transforms/optimisation.py 1 95.45%
packages/utils/src/pyearthtools/utils/decorators.py 8 74.55%
packages/data/src/pyearthtools/data/transforms/values.py 13 41.94%
packages/data/src/pyearthtools/data/transforms/dimensions.py 16 19.4%
packages/data/src/pyearthtools/data/transforms/variables.py 21 28.89%
packages/data/src/pyearthtools/data/transforms/interpolation.py 26 50.82%
packages/pipeline/src/pyearthtools/pipeline/operations/dask/normalisation.py 26 48.81%
packages/pipeline/src/pyearthtools/pipeline/operations/numpy/normalisation.py 26 47.56%
packages/data/src/pyearthtools/data/transforms/derive.py 28 63.64%
packages/data/src/pyearthtools/data/transforms/attributes.py 33 58.4%
Totals Coverage Status
Change from base Build 17692860548: 0.2%
Covered Lines: 9506
Relevant Lines: 15123

💛 - Coveralls

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest pulling this file out of the PR and getting users to train the model themselves, or else put the model weights in a local repository. It's a bit awkward putting model weights directly into the repo, mostly for reasons of size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tennessee, I can pull them out, but mainly included them as training can take quite alot of time, in the region of hours. Thus this will enable users to skip running training if they don't have time / compute to do so. The file is around 11mb. Let me know if you want it removed and I will store it somewhere else.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to put the trained weights in a known location (like an s3 bucket or known place on disk) and then reference it? If it's too awkward we can merge it and hope for the best, but generally it's preferred to keep large artefacts out of the repo. But it's more important to have something that works, so if those other things aren't possible we can just do it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. The easiest solution is to probably add them to my github and provide a download link.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weights removed and added to a separate repo and link provided in notebook.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook looks great

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook looks great

@tennlee
Copy link
Collaborator

tennlee commented Sep 18, 2025

Looks good, the code makes sense, I think this is all doing the right thing. It would be best to pull the model weights out however unless they really need to be there. I'm happy to include them if needed, but it would be better to keep them out of the main repo.

Please run the 'black' code formatter over the code.

@JMP-MO JMP-MO requested a review from tennlee September 19, 2025 09:14
@tennlee tennlee merged commit bf3f5bc into ACCESS-Community-Hub:develop Sep 20, 2025
6 checks passed
gemmaellen pushed a commit to gemmaellen/PyEarthTools that referenced this pull request Oct 1, 2025
* Add MO-ML tutorial notebooks and update code to support
* rm MO notebook which is now in the MO site archive package
* Add norm stats
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants