Skip to content

Conversation

@MarkMelotto
Copy link
Contributor

Hi @BSchilperoort

Fixing issue #491 .

Since I am not that familiar with coding style of ewatercycle, I require your help.
The problem we are facing is that caravan on 4tu is not working reliably at the moment, so I made this fallback version myself.

The crop_ds change is now needed, because we now have date instead of time, need to find another fix I think, let me know what you think!

And please help me for the overall picture as to what i can approve.

Thanks!

@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.18%. Comparing base (897b5b5) to head (e4b0dc7).

Files with missing lines Patch % Lines
src/ewatercycle/_forcings/caravan.py 84.61% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
+ Coverage   79.15%   79.18%   +0.02%     
==========================================
  Files          28       28              
  Lines        1871     1883      +12     
  Branches      165      168       +3     
==========================================
+ Hits         1481     1491      +10     
  Misses        330      330              
- Partials       60       62       +2     
Files with missing lines Coverage Δ
src/ewatercycle/_forcings/caravan.py 83.17% <84.61%> (+0.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MarkMelotto
Copy link
Contributor Author

before writing tests, I would like the feedback of Bart

@BSchilperoort
Copy link
Member

BSchilperoort commented Dec 8, 2025

before writing tests, I would like the feedback of Bart

It might be simpler to define an environmental variable CARAVAN_CACHE (containing the directory of the caravan file), and have xarray open that if it's available;

import os

if os.environ.get("CARAVAN_CACHE"):
    # load from cache
else:
    # load from 4TU

This would be a minimal change in the code, and would remove the need for users to pass things.

Just make sure you download the caravan file from the HTTP link; https://opendap.4tu.nl/thredds/fileServer/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/3/camels.nc so the file is identical to the OPeNDAP version.

@MarkMelotto
Copy link
Contributor Author

thanks for the insights!
The HTTP should still be the same right?

For now when I am working on Snellius I have caravan data mounted to something other then...
Oh wait I get it, I need to do some more reading into environmental variables!

Also, 1.3 GB that is not much :)

@MarkMelotto
Copy link
Contributor Author

I am also noticing how you use /3/ in your opendap url and not /1/ as it is in the code.
I put all the /1/ datasets on dcache now, so I just have to figure out how to set the environmental variable stuff.

@Daafip
Copy link
Collaborator

Daafip commented Dec 9, 2025

@MarkMelotto the /3 indicates v3, we did some finetuning. Nice to see you're using the data!

@MarkMelotto
Copy link
Contributor Author

hey @Daafip !

I noticed that v3 only contains the camels.nc, not the other ones, is that correct?
So I pulled them from v1.

Please let me know!

@MarkMelotto
Copy link
Contributor Author

@BSchilperoort so now I would need to add:
export CARAVAN_CACHE=~/data/shared/climate-data/caravan/

to a notebook in SRC to have this work if I understand correctly?

cache_dir = os.environ.get("CARAVAN_CACHE")
# Check if we want to load from 4TU or dCache
if cache_dir:
cache_dir = cache_dir.rstrip("/") # ensure no trailing slash issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or do

xr.open_dataset(Path(cache_dir) / "{dataset}.nc")

@BSchilperoort
Copy link
Member

export CARAVAN_CACHE=~/data/shared/climate-data/caravan/

or add that to the end of your ~/.bashrc file. That will make it always available for you.

@MarkMelotto
Copy link
Contributor Author

@BSchilperoort alright thanks for the help.
Then I guess I should write a test and then we can push to main?

Then for new SRC machines the correct version will be installed already, but for the other we need to update them.

@BSchilperoort
Copy link
Member

Then I guess I should write a test and then we can push to main?

Then for new SRC machines the correct version will be installed already, but for the other we need to update them.

I think you also need to make a new release, and update infra for it to be on new SRC machines.

Note that for SRC you should be able to define environment vars for all users as well. For this you will have to modify the infra repo.

See;

i.e. seems like modifying the jupyterhub_spawner_environment var of the to jupyter ansible task, by adding the new environment variable should do the trick. I think the default values for these are defined here

Should be a good first issue to fix in the ansible setup so you can learn how that works 😉 You should follow Stefan's guide on how to test this locally; https://github.com/eWaterCycle/infra/blob/c02600be0b135577da9682d5b4986b81c8162ac5/VAGRANT.md

@MarkMelotto
Copy link
Contributor Author

phew thanks @BSchilperoort that is a lot of new stuff :D

I think the idea was to have this as a fail safe and still use 4tu as default.
But when needed we provide an opportunity to override the 4tu bit.

So I guess that just leaves the release?

@BSchilperoort
Copy link
Member

I think the idea was to have this as a fail safe and still use 4tu as default.

OK then this is fine, yes. You can now override the 4TU default if required. Remember to also update the docstrings to reflect this new behavior.

@MarkMelotto
Copy link
Contributor Author

Hopefully getting the shapefiles will not cause issues...

@BSchilperoort
Copy link
Member

Hopefully getting the shapefiles will not cause issues...

You could also use the cache for that. Should a straightforward addition

@Daafip
Copy link
Collaborator

Daafip commented Dec 9, 2025

I noticed that v3 only contains the camels.nc, not the other ones, is that correct?

v3 is updated 2025-09-16T06:19:23.356Z:
https://opendap.4tu.nl/thredds/catalog/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/3/catalog.html. We updated the structure of the netcdf I believe.

I think I had a miscommunication with 4TU, the other files from v2 for the other netcdf files are the ones to use.
https://opendap.4tu.nl/thredds/catalog/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/2/catalog.html

@MarkMelotto
Copy link
Contributor Author

I think I had a miscommunication with 4TU, the other files from v2 for the other netcdf files are the ones to use. https://opendap.4tu.nl/thredds/catalog/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/2/catalog.html

Hmm that might be the issue we are facing with 4TU then?
I will also change that!

@Daafip
Copy link
Collaborator

Daafip commented Dec 9, 2025

might be the issue we are facing with 4TU then?

I don't thinks so, I cant quickly find the issue why we did the update but there was a reason for it..

Copy link
Member

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some suggestions

@MarkMelotto
Copy link
Contributor Author

some suggestions

thank you!
I am struggling with the shapefile pathing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants