Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions modules/master-node-sizing.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ The following control plane node size recommendations are based on the results o

| 252
| 4000
| 16, but 24 if using the OVN-Kubernetes network plug-in
| 64, but 128 if using the OVN-Kubernetes network plug-in
| 16, but 24 if using the OVN-Kubernetes network plugin
| 64, but 128 if using the OVN-Kubernetes network plugin

| 501, but untested with the OVN-Kubernetes network plug-in
| 501, but untested with the OVN-Kubernetes network plugin
| 4000
| 16
| 96
Expand All @@ -48,7 +48,7 @@ The following control plane node size recommendations are based on the results o

The data from the table above is based on an {product-title} running on top of AWS, using r5.4xlarge instances as control-plane nodes and m5.2xlarge instances as compute nodes.

On a large and dense cluster with three control plane nodes, the CPU and memory usage will spike up when one of the nodes is stopped, rebooted, or fails. The failures can be due to unexpected issues with power, network, underlying infrastructure, or intentional cases where the cluster is restarted after shutting it down to save costs. The remaining two control plane nodes must handle the load in order to be highly available, which leads to increase in the resource usage. This is also expected during upgrades because the control plane nodes are cordoned, drained, and rebooted serially to apply the operating system updates, as well as the control plane Operators update. To avoid cascading failures, keep the overall CPU and memory resource usage on the control plane nodes to at most 60% of all available capacity to handle the resource usage spikes. Increase the CPU and memory on the control plane nodes accordingly to avoid potential downtime due to lack of resources.
On a large and dense cluster with three control plane nodes, the CPU and memory usage will spike up when one of the nodes is stopped, rebooted, or fails. The failures can be due to unexpected issues with power, network, underlying infrastructure, or intentional cases where the cluster is restarted after shutting it down to save costs. The remaining two control plane nodes must handle the load to be highly available, which leads to increase in the resource usage. This is also expected during upgrades because the control plane nodes are cordoned, drained, and rebooted serially to apply the operating system updates, and the control plane Operators update. To avoid cascading failures, keep the overall CPU and memory resource usage on the control plane nodes to at most 60% of all available capacity to handle the resource usage spikes. Increase the CPU and memory on the control plane nodes accordingly to avoid potential downtime due to lack of resources.

[IMPORTANT]
====
Expand Down Expand Up @@ -125,5 +125,5 @@ For all other configurations, you must estimate your total node count and use th

[NOTE]
====
In {product-title} {product-version}, half of a CPU core (500 millicore) is now reserved by the system by default compared to {product-title} 3.11 and previous versions. The sizes are determined taking that into consideration.
In {product-title} {product-version}, the system reserves half of a CPU core (500m) by default compared to {product-title} 3.11 and earlier versions. The sizing recommendations account for this reservation.
====
2 changes: 1 addition & 1 deletion modules/recommended-scale-practices.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,5 @@ Enable machine health checks when scaling to large node counts. In case of failu

[NOTE]
====
When scaling large and dense clusters to lower node counts, it might take large amounts of time because the process involves draining or evicting the objects running on the nodes being terminated in parallel. Also, the client might start to throttle the requests if there are too many objects to evict. The default client queries per second (QPS) and burst rates are currently set to `50` and `100` respectively. These values cannot be modified in {product-title}.
When scaling large and dense clusters to lower node counts, it might take large amounts of time because the process involves draining or evicting the objects running on the nodes being terminated in parallel. Also, the client might start to throttle the requests if there are too many objects to evict. The default client queries per second (QPS) rate is set to `50` and the default burst rate is set to `100`. You cannot modify these values in {product-title}.
====