Conversation
|
attached the issue #218 |
There was a problem hiding this comment.
Pull request overview
Adds initial backend support for generating wind rose data from hourly timeseries, while refactoring height validation/config to distinguish windspeed vs winddirection availability per model.
Changes:
- Added
/{model}/windroseAPI endpoint with binned and raw output modes. - Implemented wind rose computation core using hourly timeseries (sectoring + calm handling + optional speed binning).
- Updated model configuration and height validation to support separate windspeed and winddirection height lists.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
windwatts-api/app/utils/wind_data_core.py |
Adds wind rose computation core and updates windspeed/production height validation calls. |
windwatts-api/app/utils/validation.py |
Changes validate_height to require a height_type and read type-specific heights from config. |
windwatts-api/app/schemas.py |
Introduces new response schemas for binned and raw wind rose outputs. |
windwatts-api/app/controllers/wind_data_controller.py |
Adds a new GET endpoint for wind rose with a union response model. |
windwatts-api/app/config/model_config.py |
Splits configured heights into windspeed_heights and winddirection_heights per model. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "schema": "quantile_yearly", | ||
| "years": {"full": list(range(2013, 2024)), "sample": [2020, 2021, 2022, 2023]}, | ||
| "heights": [30, 40, 50, 60, 80, 100], | ||
| "windspeed_heights": [30, 40, 50, 60, 80, 100], |
There was a problem hiding this comment.
MODEL_CONFIG entries were changed to replace the heights key with windspeed_heights / winddirection_heights, but other code still reads config.get("heights") (e.g., info endpoint) and MODEL_CONFIG[model].get("heights") (e.g., timeseries-energy core). This will cause empty available_heights and can break energy/timeseries code paths. Consider either keeping a backward-compatible heights alias (likely to windspeed_heights) or updating all consumers to the new keys in the same PR.
| "windspeed_heights": [30, 40, 50, 60, 80, 100], | |
| "windspeed_heights": [30, 40, 50, 60, 80, 100], | |
| "heights": [30, 40, 50, 60, 80, 100], |
There was a problem hiding this comment.
- let's group the heights like other fields, like years, grid_info, so it would be, something like, "heights": {"windspeed": [], "wind_direction": []}
- The copilot suggestion is correct, you would need to check and update the instances with MODEL_CONFIG[model].get("heights") e.g. the /info.
| valid_heights = MODEL_CONFIG[model].get(f"{height_type}_heights", []) | ||
| if valid_heights and height not in valid_heights: | ||
| raise HTTPException( | ||
| status_code=400, |
There was a problem hiding this comment.
validate_height currently treats an empty list of valid heights the same as “no validation” because the guard is if valid_heights and .... With the new config, some models set winddirection_heights: [] to indicate wind direction is unsupported; this function will still accept any height and only fail later (or not at all). Consider distinguishing between None/missing (skip validation) vs an explicit empty list (reject all heights with a clear 400).
| valid_heights = MODEL_CONFIG[model].get(f"{height_type}_heights", []) | |
| if valid_heights and height not in valid_heights: | |
| raise HTTPException( | |
| status_code=400, | |
| model_config = MODEL_CONFIG[model] | |
| heights_key = f"{height_type}_heights" | |
| # Distinguish between: | |
| # - missing key (no validation, accept any height) | |
| # - explicit empty list (height type unsupported for this model) | |
| # - non-empty list (validate membership) | |
| if heights_key in model_config: | |
| valid_heights = model_config[heights_key] | |
| else: | |
| valid_heights = None | |
| if valid_heights is None: | |
| # No configured constraint for this height type; skip validation. | |
| return height | |
| if not valid_heights: | |
| # Explicitly configured as unsupported for this model. | |
| raise HTTPException( | |
| status_code=400, | |
| detail=f"Height type '{height_type}' is not supported for model '{model}'.", | |
| ) | |
| if height not in valid_heights: | |
| raise HTTPException( | |
| status_code=400, |
There was a problem hiding this comment.
The copilot suggestion is valid, it is better to differentiate "unsupported" or "invalid".
| return csv_io.getvalue() | ||
|
|
||
| def _make_sector_angles(n: int): | ||
| "Return list of centre bearings (degress) and sector width for n equal sectors." |
There was a problem hiding this comment.
Typo in the docstring: "degress" should be "degrees".
| "Return list of centre bearings (degress) and sector width for n equal sectors." | |
| "Return list of centre bearings (degrees) and sector width for n equal sectors." |
| if not (0 <= calm_threshold < 3): | ||
| raise HTTPException(status_code=400, detail="calm_threshold must be >= 0 and < 3") |
There was a problem hiding this comment.
In the binned windrose path, speed bins start at 0.5 m/s, but calm_threshold is allowed down to 0.0. If a client sets calm_threshold < 0.5, speeds in [calm_threshold, 0.5) will be treated as “active” (not calm) but won’t fall into any speed bin, so per-sector bin fractions won’t add up to frequency. Consider either enforcing calm_threshold >= 0.5 when binned=True, or dynamically defining the first bin edge based on calm_threshold (and updating labels accordingly).
| if not (0 <= calm_threshold < 3): | |
| raise HTTPException(status_code=400, detail="calm_threshold must be >= 0 and < 3") | |
| # For binned windroses, speed bins start at 0.5 m/s. To avoid having non-calm | |
| # speeds that do not belong to any bin, enforce a minimum calm_threshold of 0.5. | |
| if binned: | |
| if not (0.5 <= calm_threshold < 3): | |
| raise HTTPException( | |
| status_code=400, | |
| detail="For binned windrose, calm_threshold must be >= 0.5 and < 3", | |
| ) | |
| else: | |
| if not (0 <= calm_threshold < 3): | |
| raise HTTPException( | |
| status_code=400, | |
| detail="For unbinned windrose, calm_threshold must be >= 0 and < 3", | |
| ) |
There was a problem hiding this comment.
related to the default value of calm_threshold comment: is no-calm okay, i.e. threshold = 0? if it is okay, we can select one of the two approaches i suggested in the comment:
- default to 0 and user can specify
- make it optional, we prepare default values for windspeed/ energy/ power, etc. or a percentage, for example, 10% of the bin size - if bin size is 1, threshold is 0.1
| frequency: float = Field(..., description="Fraction of all hours in this sector (excl. calm)") | ||
| calm: float = Field(..., description="calm_fraction repeated per bin for stacked chart") |
There was a problem hiding this comment.
WindRoseBin.frequency is documented as excluding calm hours, but the current implementation computes frequency as a fraction of total_hours (which includes calm). Either adjust the implementation to divide by non-calm hours, or update this description (and potentially the other bin field descriptions) to match the actual denominator.
| frequency: float = Field(..., description="Fraction of all hours in this sector (excl. calm)") | |
| calm: float = Field(..., description="calm_fraction repeated per bin for stacked chart") | |
| frequency: float = Field(..., description="Fraction of total_hours in this sector (sector_hours / total_hours)") | |
| calm: float = Field(..., description="Calm fraction (calm_hours / total_hours) repeated per bin for stacked chart") |
| if not binned: | ||
| # Raw - return the actual windspeed values per sector instead of frequency bins | ||
| sectors_data = [] | ||
| for i, deg in enumerate(sector_centers): | ||
| sector_ws = active_ws[sector_idx == i].round(3).tolist() | ||
| sectors_data.append({ | ||
| "direction_deg": deg, | ||
| "windspeeds": sector_ws | ||
| }) | ||
| return { | ||
| **common, | ||
| "sectors_data": sectors_data, | ||
| "calm_windspeeds": ws[calm_mask].round(3).tolist(), | ||
| "binned": False | ||
| } |
There was a problem hiding this comment.
When binned=False, the endpoint returns all per-hour windspeed values (plus all calm windspeeds) as JSON arrays. For year_set=full or wide year_range, this can become a very large response (tens of thousands+ floats) and may cause high latency/memory usage or exceed gateway limits. Consider adding a max-hours limit/guard for raw responses, requiring year_set=sample for binned=False, or providing an alternative aggregated/raw-download format (e.g., CSV/stream).
| "schema": "quantile_yearly", | ||
| "years": {"full": list(range(2013, 2024)), "sample": [2020, 2021, 2022, 2023]}, | ||
| "heights": [30, 40, 50, 60, 80, 100], | ||
| "windspeed_heights": [30, 40, 50, 60, 80, 100], |
There was a problem hiding this comment.
- let's group the heights like other fields, like years, grid_info, so it would be, something like, "heights": {"windspeed": [], "wind_direction": []}
- The copilot suggestion is correct, you would need to check and update the instances with MODEL_CONFIG[model].get("heights") e.g. the /info.
| valid_heights = MODEL_CONFIG[model].get(f"{height_type}_heights", []) | ||
| if valid_heights and height not in valid_heights: | ||
| raise HTTPException( | ||
| status_code=400, |
There was a problem hiding this comment.
The copilot suggestion is valid, it is better to differentiate "unsupported" or "invalid".
| model: str = Path(..., description="Data model: era5-timeseries"), | ||
| gridIndex: str = Query(..., description="Grid index identifier"), | ||
| height: int = Query(..., description="Height in meters"), | ||
| binned: bool = Query(True, description="True: pre-binned rose; False: raw per-sector speed arrays"), |
There was a problem hiding this comment.
please use int bin instead of bool binned as discussed #218 :
bin = use a default bin, special case: bin=0 means full resolution or no averaging/ binning
We could use a default bin number, like 5 or something, or use user specified value. If user provides bin = 0, that means the user wants to have the full resolution.
| height: int = Query(..., description="Height in meters"), | ||
| binned: bool = Query(True, description="True: pre-binned rose; False: raw per-sector speed arrays"), | ||
| sectors: int = Query(16, description="Directional sectors: 4, 8 or 16"), | ||
| calm_threshold: float = Query(0.5, description="Speed (m/s) below which a row is calm"), |
There was a problem hiding this comment.
later we are going to allow rose for energy, would it be better if we default to 0 - no cutoff, and let user specify a value? or should we make it optional and we provide default values in the logic?
| return { | ||
| **common, | ||
| "bins": bins, | ||
| "binned": True |
| for j, slabel in enumerate(SPEED_LABELS): | ||
| lo = SPEED_BINS[j] | ||
| hi = SPEED_BINS[j + 1] if j + 1 < len(SPEED_BINS) else np.inf | ||
| count = int(((sector_ws >= lo) & (sector_ws < hi)).sum()) |
There was a problem hiding this comment.
instead of counting, let's use sort to improve the performance:
sort --> find the bin end index of the center in sector array using bisect
|
|
||
| class WindRoseSectorRaw(BaseModel): | ||
| direction_deg: float = Field(..., description="Centre bearing in degrees (0 = N, clockwise)") | ||
| windspeeds: List[float] = Field(..., description="Raw wind speed values (m/s) whose direction fell in this sector") |
There was a problem hiding this comment.
this could be simplified to a dict [dir: array] so we can apply to wind speed, energy, power, etc.
| direction_deg: float = Field(..., description="Centre bearing in degrees (0 = N, clockwise)") | ||
| windspeeds: List[float] = Field(..., description="Raw wind speed values (m/s) whose direction fell in this sector") | ||
|
|
||
| class WindRoseRawResponse(BaseModel): |
There was a problem hiding this comment.
check my comments, the response can be consolidated to 1 for windrose (binned or unbinned), only difference is unbinned has empty bins [] list.
| direction_deg: float = Field(..., description="Center bearing in degrees (0 = N, clockwise)") | ||
| frequency: float = Field(..., description="Fraction of all hours in this sector (excl. calm)") | ||
| calm: float = Field(..., description="calm_fraction repeated per bin for stacked chart") | ||
| speed_0_3: float = Field(..., alias="0-3") |
There was a problem hiding this comment.
we will allow variable bins, and automatically calculate the labels and values, so it could support different ranges for not only speed, but also energy, power, etc.
RLiNREL
left a comment
There was a problem hiding this comment.
Please fix the merge conflict also.
| ..., description="Geometry of each sector" | ||
| ) | ||
| bin_info: List[RoseBinInfo] = Field( | ||
| ..., description="Value range of each bin. Empty in raw mode (no_of_bins=1)" |
There was a problem hiding this comment.
It is not empty when bins= 1, it will be just List of one RoseBinInfo.
| height: int = Query(..., description="Height in meters"), | ||
| bin: int = Query( | ||
| 5, | ||
| description="Speed band count per sector. 1 = raw mode: returns the individual wind speed values for each sector. 2–10 = binned mode: divides the wind speed range (0 to site max) into this many equal-width bands and returns the fraction of hours in each band per sector. Default: 5.", |
There was a problem hiding this comment.
There is no difference between bin=1 and bin=2 or others, bin=1 is a list of one bin, bin=2 is a list of two bins, ... bin=1 is also a binned mode, with just one bin. We should say return sorted wind values divided to the number of bins.
|
|
||
| def validate_calm_threshold(calm_threshold: float) -> float: | ||
| "Validate calm threshold for WindRose" | ||
| if not (0 <= calm_threshold < 3): |
There was a problem hiding this comment.
OPTIONAL: 3 is fine for speed, but would be quite low if we do windrose for energy/ power. Please add a comment: TODO: adjust the upper threshold based on windrose type.
There was a problem hiding this comment.
same will apply for validate_bin too
There was a problem hiding this comment.
bin number range may be fine for all data types, but okay, we will revisit later
|
|
||
| def validate_bin(bin: int) -> int: | ||
| """Validate bin parameter for WindRose""" | ||
| if bin < 0: |
There was a problem hiding this comment.
should be bin <= 0. Valid bin in [1, 10]. Again bin=1 is not raw mode, it is just one bin.
| "terser": "^5.43.1", | ||
| "typescript": "~5.8.3", | ||
| "typescript-eslint": "^8.35.1", | ||
| "vite": "6.4.1", |
There was a problem hiding this comment.
please don't override this, the vite and vitest version were downgraded so no conflict with yarn in the docker build process. We will migrate to npm
| sector_index: int = Field(..., description="Sector this cell belongs to") | ||
| bin_index: int = Field(..., description="Bin this cell belongs to") | ||
| frequency: float = Field( | ||
| ..., description="Fraction of total hours in this (sector, bin) cell" |
There was a problem hiding this comment.
try to be more general, instead of hours, let's say data points. Also check other descriptions
| frequency: float = Field( | ||
| ..., description="Fraction of total hours in this (sector, bin) cell" | ||
| ) | ||
| data: List[float] = Field(..., description="Raw values in this (sector, bin) cell") |
There was a problem hiding this comment.
Let's say "Sorted values in this .."
| no_of_sectors: int = Field(..., description="Number of compass sectors") | ||
| no_of_bins: int = Field(..., description="Number of bins") | ||
| calm_info: RoseCalmInfo = Field(..., description="Calm threshold and fraction") | ||
| calm_data: List[float] = Field(..., description="Raw values below calm_threshold") |
| calm_info: RoseCalmInfo = Field(..., description="Calm threshold and fraction") | ||
| calm_data: List[float] = Field(..., description="Raw values below calm_threshold") | ||
| sector_info: List[RoseSectorInfo] = Field( | ||
| ..., description="Geometry of each sector" |
There was a problem hiding this comment.
Suggest: "Statistics of each sector"
| ) | ||
| bin_info: List[RoseBinInfo] = Field(..., description="Value range of each bin.") | ||
| bin_data: List[RoseBinData] = Field( | ||
| ..., description="Frequency and raw values per (sector, bin) cell" |
|
|
||
| def validate_calm_threshold(calm_threshold: float) -> float: | ||
| "Validate calm threshold for WindRose" | ||
| if not (0 <= calm_threshold < 3): |
There was a problem hiding this comment.
bin number range may be fine for all data types, but okay, we will revisit later
| height: int = Query(..., description="Height in meters"), | ||
| bin: int = Query( | ||
| 5, | ||
| description="Number of equal-width speed bins to divide the wind speed range (0 to site max) into per sector. Sorted wind speed values and their frequency are returned for each bin. Default: 5.", |
There was a problem hiding this comment.
Either remove the "speed" making the description more general, or add a TODO comment that we are going to accept more data type, power ...
| float("inf") | ||
| ] | ||
| boundaries = [ | ||
| bisect.bisect_left(sector_ws, e) for e in bin_edges[:-1] + [float("inf")] |
There was a problem hiding this comment.
The handling of e in bin_edges[:-1] + [float("inf")] looks strange to me. What you had before was fine, I was just suggesting a possible way. If you like to keep this idea, you might want do this, which makes more sense to me, but make sure the "inf" in sector_ws doesn't get into the results accidentally:
bisect.bisect_right(sector_ws, e) for e in bin_edges[1:]
- skip 0 in bin edges, start looking for the index of right edge for each bin
- sector_ws has "inf" at the end, guarantee a index of right edge.
| bisect.bisect_left(sector_ws, e) for e in bin_edges[:-1] + [float("inf")] | ||
| ] | ||
| for j in range(bin): | ||
| data = sector_ws[boundaries[j] : boundaries[j + 1]] |
There was a problem hiding this comment.
would this accidentally include the "inf" in sector_ws? Make sure it is not included.
address #218
added windrose endpoint and windrose core for calculations for bin and unbinned cases.
added new schemas for windrose
changed validate_height to support windspeed and winddirection
changed model config to have windspeed heights and winddirection heights for data models seperately