-
-
Notifications
You must be signed in to change notification settings - Fork 46
Add optional t0 embedding features #385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| # Add datetime features | ||
| datetimes = pd.DatetimeIndex(da_generation.time_utc.values) | ||
| datetime_features = encode_datetimes(datetimes=datetimes) | ||
| # Add datetime features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly unrelated to the main PR here. I wanted to decouple the datetime features from the generation dataarray the same way the solar features are decoupled
This is really just a bit of a clean up
| # Only add solar position if explicitly configured | ||
| if self.config.input_data.solar_position is not None: | ||
| solar_config = self.config.input_data.solar_position | ||
| # Only add solar position if explicitly configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another semi-related clean up I spotted along the way. Adding the solar features should not depend on the generation data being there
Pull Request
Context
Previously, I did an experiment which showed that giving the model a feature encoding whether t0 is HH:00 or HH:30 increased the model accuracy. The experiment was motivated by the fact the the NWP data is hourly and the model may want to learn to use the NWP differently at HH:00 and HH:00 t0 times. The experiment write up is here
PR description
This PR adds the option to include embeddings of the t0 time. I've tried to generalised the functionality so that we can choose the periodicity of the embedding features. i.e. we can encode how far through each hour t0 is. For 3-hourly NWP steps we can also generate hour far through each 3-hour period t0 is. We can also use it to make time-of-day (by adding "24h" in
periods) and day-of-year features (by adding "1y" inperiods).I've made it configurable so that we can choose a representation for each periodicity. For example, for time-of-day we likely want to sin-cos embed the times ("cyclic" in the code) since 11:59pm is close to 00:00am. But for time-past-the-hour features we likely likely want linear encodings where
HH:00 -> 0andHH:59 -> ~1. This time-past-the-hour feature is supposed to inform the network about how to interpolate the NWP data.We shift the NWP inputs by a full hour between HH:59 (HH+1):00. So in this way HH:00 is maximally dissimilar HH:59 - so we would want our time feature to reflect this.Extra notes
I did a couple of extra semi-related clean-ups in this PR that I couldn't resist. I've noted them
Future usage
I think it's likely that this function could replace the usage of
encode_datetimes()entirely. Our PVNet models are trained to predict for fixed set of horizons, so if we already have the time-of-day and day-of-year features of t0, there is no additional information in knowing the other forecast datetimes. Once we (or the model) know t0 we already know all forecast datetimes.I think we should try to move to sunset the
encode_datetimes()function but I've left it in for now. We'll need to do a little testing and I don't want to break backwards compatibility.How Has This Been Tested?
I've added tests for the exact numeric values which
get_t0_embedding()produced and checked that it works inside thePVNetDatasetclass.Checklist: