Conversation
5e1cb0c to
c347ef7
Compare
a0d1273 to
409e5d5
Compare
7247ac6 to
060a346
Compare
|
@mxsrc This PR is ready for review. Also have added testing results in the PR. |
There was a problem hiding this comment.
Pull request overview
This PR implements PITR-aware IOPS allocation by splitting the requested branch IOPS budget between the main data volume and the WAL volume (25% WAL / 75% data), leveraging Simplyblock CSI’s per-PVC QOS annotations.
Changes:
- Add support for rendering user-provided annotations onto the WAL PVC Helm template.
- Introduce
WAL_IOPS_FRACTIONand apply IOPS splitting during branch provisioning/config generation and during volume IOPS updates. - Ensure clone/restore flows create the StorageClass with the (intended) data-volume IOPS portion.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| src/deployment/charts/vela/templates/autoscaler/wal-pvc.yaml | Allows WAL PVC annotations to be rendered from chart values (needed for per-PVC QOS). |
| src/deployment/init.py | Splits IOPS into data/WAL portions; annotates WAL PVC values; updates Simplyblock volumes accordingly. |
| src/api/organization/project/branch/init.py | Adjusts clone/restore StorageClass creation to use the data portion of the IOPS budget. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/deployment/__init__.py
Outdated
| data_iops = max(1, round(iops * (1 - WAL_IOPS_FRACTION))) | ||
| wal_iops = max(1, round(iops * WAL_IOPS_FRACTION)) |
There was a problem hiding this comment.
update_branch_volume_iops always splits the requested IOPS into data/WAL and applies the reduced data_iops even when a WAL PVC/volume does not exist (e.g., PITR disabled). This will effectively drop 25% of the user-requested IOPS on non-PITR branches. Consider only splitting when the WAL PVC/volume is present (or when PITR is enabled), otherwise apply the full iops to the data volume.
| try: | ||
| async with create_simplyblock_api() as sb_api: | ||
| await sb_api.update_volume(volume=wal_volume, payload={"max_rw_iops": wal_iops}) | ||
| except VelaSimplyblockAPIError as exc: | ||
| raise VelaDeploymentError("Failed to update WAL volume") from exc | ||
| logger.info("Updated Simplyblock WAL volume %s IOPS to %s", wal_volume, wal_iops) | ||
| except VelaDeploymentError as exc: | ||
| logger.info("WAL volume not found for branch %s; skipping WAL IOPS update: %s", branch_id, exc) | ||
|
|
||
|
|
There was a problem hiding this comment.
The WAL update error handling is incorrect here: (1) WAL PVC lookup failures from Kubernetes raise VelaKubernetesError, not VelaDeploymentError, so missing WAL PVCs won’t be skipped as intended; and (2) raising VelaDeploymentError("Failed to update WAL volume") is also caught by this outer handler and logged as “WAL volume not found”, silently masking real update failures. Restructure this block to only swallow the specific “PVC not found” case and propagate/raise on actual update errors.
| try: | |
| async with create_simplyblock_api() as sb_api: | |
| await sb_api.update_volume(volume=wal_volume, payload={"max_rw_iops": wal_iops}) | |
| except VelaSimplyblockAPIError as exc: | |
| raise VelaDeploymentError("Failed to update WAL volume") from exc | |
| logger.info("Updated Simplyblock WAL volume %s IOPS to %s", wal_volume, wal_iops) | |
| except VelaDeploymentError as exc: | |
| logger.info("WAL volume not found for branch %s; skipping WAL IOPS update: %s", branch_id, exc) | |
| except VelaKubernetesError as exc: | |
| # Only swallow the specific "PVC not found" case; propagate other errors. | |
| cause = exc.__cause__ | |
| if isinstance(cause, ApiException) and getattr(cause, "status", None) == 404: | |
| logger.info( | |
| "WAL volume PVC not found for branch %s; skipping WAL IOPS update: %s", | |
| branch_id, | |
| exc, | |
| ) | |
| return | |
| raise | |
| try: | |
| async with create_simplyblock_api() as sb_api: | |
| await sb_api.update_volume(volume=wal_volume, payload={"max_rw_iops": wal_iops}) | |
| except VelaSimplyblockAPIError as exc: | |
| raise VelaDeploymentError("Failed to update WAL volume") from exc | |
| logger.info("Updated Simplyblock WAL volume %s IOPS to %s", wal_volume, wal_iops) |
src/deployment/__init__.py
Outdated
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | ||
| wal_iops = max(1, round(parameters.iops * WAL_IOPS_FRACTION)) |
There was a problem hiding this comment.
create_vela_config computes data_iops/wal_iops and creates the branch StorageClass with only data_iops regardless of pitr_enabled. For non-PITR branches (no WAL PVC), this will provision the main volume(s) at ~75% of the requested IOPS. Make the split conditional on pitr_enabled (or presence of WAL), otherwise use parameters.iops as-is.
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| wal_iops = max(1, round(parameters.iops * WAL_IOPS_FRACTION)) | |
| if pitr_enabled: | |
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| wal_iops = max(1, round(parameters.iops * WAL_IOPS_FRACTION)) | |
| else: | |
| # For non-PITR branches (no WAL PVC), provision the main volume(s) | |
| # with the full requested IOPS instead of a reduced "data_iops". | |
| data_iops = parameters.iops | |
| wal_iops = max(1, round(parameters.iops * WAL_IOPS_FRACTION)) |
src/deployment/__init__.py
Outdated
| wal_persistence["create"] = not use_existing_db_pvc | ||
| wal_persistence["size"] = PITR_WAL_PVC_SIZE | ||
| wal_persistence["storageClassName"] = storage_class_name | ||
| wal_persistence["annotations"] = {"simplybk/qos-rw-iops": str(wal_iops)} |
There was a problem hiding this comment.
wal_persistence["annotations"] is assigned a new dict, which will discard any existing WAL PVC annotations coming from chart defaults or user overrides. Merge into the existing annotations mapping (e.g., setdefault + update) so other annotations aren’t lost.
| wal_persistence["annotations"] = {"simplybk/qos-rw-iops": str(wal_iops)} | |
| wal_annotations = wal_persistence.setdefault("annotations", {}) | |
| wal_annotations["simplybk/qos-rw-iops"] = str(wal_iops) |
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | ||
| storage_class_name = await ensure_branch_storage_class(branch_id, iops=data_iops) |
There was a problem hiding this comment.
This code reduces StorageClass IOPS to data_iops unconditionally. Since this task has a pitr_enabled parameter, the IOPS split should be applied only when PITR/WAL is enabled; otherwise cloned/restored branches will get ~75% of the requested IOPS.
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| storage_class_name = await ensure_branch_storage_class(branch_id, iops=data_iops) | |
| if pitr_enabled: | |
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| storage_class_name = await ensure_branch_storage_class(branch_id, iops=data_iops) | |
| else: | |
| storage_class_name = await ensure_branch_storage_class(branch_id, iops=parameters.iops) |
| storage_class_name: str | None = None | ||
| try: | ||
| storage_class_name = await ensure_branch_storage_class(branch_id, iops=parameters.iops) | ||
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) |
There was a problem hiding this comment.
StorageClass is created with data_iops unconditionally here. If pitr_enabled is false for the target branch, this will under-provision IOPS by ~25% (no WAL volume to receive the remainder). Apply the split only when PITR/WAL is enabled.
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| if pitr_enabled: | |
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| else: | |
| data_iops = parameters.iops |
|
|
||
| try: | ||
| storage_class_name = await ensure_branch_storage_class(branch_id, iops=parameters.iops) | ||
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) |
There was a problem hiding this comment.
As in other restore/clone paths, data_iops is derived from the total IOPS without checking whether PITR/WAL is enabled. If the branch does not have a WAL PVC, this will under-allocate IOPS. Consider conditioning this split on PITR/WAL being enabled (or on WAL PVC presence).
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| if recovery_target_time is not None: | |
| # PITR/WAL is in use; reserve a fraction of total IOPS for WAL. | |
| data_iops = max(1, round(parameters.iops * (1 - WAL_IOPS_FRACTION))) | |
| else: | |
| # No PITR/WAL expected; allocate all available IOPS to the data volume. | |
| data_iops = max(1, parameters.iops) |
060a346 to
9a82a68
Compare
|
moving this back to Draft. |
fixes: simplyblock/vela#256
PR: simplyblock/simplyblock-csi#294 allows taking QOS IOPS parameters Via annotation. this PR makes use of it.
PTIR: split the assigned IOPS and provide 25% for the WAL and 75% for the data volume
testing
Created a branch with PITR enabled and IOPS as 1000 (which will create create 2 PVCs)
for DATA PVC
and for WAL PVC