Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@
CPU partitioning separates sensitive workloads from general-purpose tasks, interrupts, and driver work queues to improve performance and reduce latency.

New in this release::
* Optional support for the `acpi_idle` CPUIdle driver.
* The `systemReserved` field replaces the `autoSizingReserved` field to specify 11Gi memory for worker nodes and 30Gi for control plane nodes.
* Enable triggering a kernel panic through a non-maskable interrupt for system recovery and diagnostic purposes when `x86_64` architecture nodes become unresponsive.
* No reference design updates in this release

Description::
CPU partitioning improves performance and reduces latency by separating sensitive workloads from general-purpose tasks, interrupts, and driver work queues.
Expand Down
1 change: 0 additions & 1 deletion modules/telco-core-crs-cluster-infrastructure.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,4 @@ Disconnected configuration,`idms.yaml`,Defines a list of mirrored repository dig
Disconnected configuration,`operator-hub.yaml`,Defines an OperatorHub configuration which disables all default sources.,No
Monitoring and observability,`monitoring-config-cm.yaml`,Configuring storage and retention for Prometheus and Alertmanager.,Yes
Power management,`PerformanceProfile.yaml`,"Defines a performance profile resource, specifying CPU isolation, hugepages configuration, and workload hints for performance optimization on selected nodes.",No
Power management,`TunedPerformancePatch.yaml`,"Applies performance tuning overrides for worker profiles and enables kernel panic on non-maskable interrupts (NMI) for system recovery on unresponsive nodes.",No
|====
2 changes: 1 addition & 1 deletion modules/telco-core-monitoring.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
The Cluster Monitoring Operator (CMO) is included by default in {product-title} and provides monitoring for the platform components and optionally user projects.

New in this release::
* Optionally use the `remoteWrite` field in Prometheus configurations for direct export of metrics
* No reference design updates in this release.

Description::
The Cluster Monitoring Operator (CMO) is included by default in {product-title} and provides monitoring (metrics, dashboards, and alerting) for the platform components and optionally user projects.
Expand Down
9 changes: 7 additions & 2 deletions modules/telco-core-networking.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@ image::openshift-telco-core-rds-networking.png[Overview of the telco core refere


New in this release::
* No reference design updates in this release
* Per-interface IP forwarding
* Support for alt-names on network interfaces
* Host firewall rules for multi-node clusters
* MetalLB `ConfigurationStatus` CRD
* Extended BGP/L2 advertisements with service selectors
* Support for Intel E830 Jasper Beach in telco core

[NOTE]
====
Expand All @@ -41,7 +46,7 @@ EgressIP is further discussed in the "Cluster Network Operator" section.
.. Configure VLAN interfaces and specific kernel IP routes on the nodes using `NodeNetworkConfigurationPolicy` CRs.
.. Create a MetalLB `BGPPeer` CR for each VLAN to establish peering with the remote BGP router.
.. Define a MetalLB `BGPAdvertisement` CR to specify which IP address pools should be advertised to a selected list of `BGPPeer` resources. The following diagram illustrates how specific service IP addresses are advertised externally through specific VLAN interfaces. Services routes are defined in `BGPAdvertisement` CRs and configured with values for `IPAddressPool1` and `BGPPeer1` fields.

* Host-level per-interface IP forwarding can be enabled or disabled by using `net.ipv4.conf.<interface>.forwarding` through the install network configuration (Day 1) or the Kubernetes NMState Operator (Day 2).

.Telco core reference design MetalLB service separation
image::openshift-telco-core-rds-metallb-service-separation.png[Telco core reference design MetalLB service separation]
Expand Down
2 changes: 1 addition & 1 deletion modules/telco-core-node-configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
Node configuration for telco core clusters includes additional kernel modules, container mount namespace settings, and kdump configuration.

New in this release::
* Consult the 4.21 release notes regarding the decrease in the default maximum open files soft limit for containers in this release.
* No reference design updates in this release.

Limits and requirements::
* Analyze additional kernel modules to determine impact on CPU load, system performance, and ability to meet KPIs.
Expand Down
8 changes: 4 additions & 4 deletions modules/telco-core-red-hat-advanced-cluster-management.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,16 @@ You apply policies with the {rh-rhacm} policy controller as managed by {cgu-oper
Configuration, upgrades, and cluster status are managed through the policy controller.

When installing managed clusters, {rh-rhacm} applies labels and initial ignition configuration to individual nodes in support of custom disk partitioning, allocation of roles, and allocation to machine config pools.
You define these configurations with `ClusterInstance` CRs.
You define these configurations with `SiteConfig` or `ClusterInstance` CRs.
--

Limits and requirements::

* Hub cluster sizing is discussed in link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.13/html-single/install/index#sizing-your-cluster[Sizing your cluster].
* Hub cluster sizing is discussed in link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.15/html-single/install/index#sizing-your-cluster[Sizing your cluster].

* {rh-rhacm} scaling limits are described in link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.13/html-single/install/index#performance-and-scalability[Performance and Scalability].
* {rh-rhacm} scaling limits are described in link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.15/html-single/install/index#performance-and-scalability[Performance and Scalability].

Engineering considerations::
* When managing multiple clusters with unique content per installation, site, or deployment, using {rh-rhacm} hub templating is strongly recommended.
{rh-rhacm} hub templating allows you to apply a consistent set of policies to clusters while providing for unique values per installation.
With {rh-rhacm} hub templating, you can apply a consistent set of policies to clusters while providing unique values per installation.

20 changes: 12 additions & 8 deletions modules/telco-core-security.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
Telco core clusters require hardening against multiple attack vectors through various security features and configurations.

New in this release::
* No reference design updates in this release.
* Starting with {product-title} {product-version}, administrators can use the `oc commatrix` plugin to automatically generate `MachineConfig` CRs with a default set of nftables firewall rules.
The generated `MachineConfig` CRs can then be added to the cluster configuration to improve security.

Description::
+
Expand All @@ -27,28 +28,31 @@ Rootless DPDK pods create a tap device in a rootless pod that injects traffic fr
* Storage: The storage network should be isolated and non-routable to other cluster networks.
See the "Storage" section for additional details.

See the Red Hat Knowledgebase solution article link:https://access.redhat.com/articles/7090422[Custom nftable firewall rules in {product-title}] for a supported method for implementing custom nftables firewall rules in {product-title} cluster nodes. This article is intended for cluster administrators who are responsible for managing network security policies in {product-title} environments.
Security conscious cluster administrators can configure nftables rules to restrict inbound network flows.
See link:https://docs.redhat.com/en/documentation/openshift_container_platform/{product-version}/html/installation_configuration/configuring-firewall.html#network-commatrix-plugin-intro_configuring-firewall[the {product-title} documentation] for a supported method for implementing custom nftables firewall rules in {product-title} cluster nodes with the `oc commatrix` plugin.
You can add the generated `MachineConfig` CRs to the cluster configuration at install time as extra manifests.
When the cluster is managed under a Hub RDS compliant {rh-rhacm} cluster the content of the `MachineConfig` resource is maintained across the life of the cluster through a `Policy` resource.

It is crucial to carefully consider the operational implications before deploying this method, including:

* Early application: The rules are applied at boot time, before the network is fully operational.
Ensure the rules don't inadvertently block essential services required during the boot process.
Ensure the rules do not inadvertently block essential services required during the boot process.
* Risk of misconfiguration: Errors in your custom rules can lead to unintended consequences, potentially leading to performance impact or blocking legitimate traffic or isolating nodes.
Thoroughly test your rules in a non-production environment before deploying them to your main cluster.
* External endpoints: {product-title} requires access to external endpoints to function.
For more information about the firewall allowlist, see "Configuring your firewall for {product-title}". Ensure that cluster nodes are permitted access to those endpoints. Ensure that cluster nodes are permitted access to those endpoints.

For more information about the firewall allowlist, see "Configuring your firewall for {product-title}".
Ensure that cluster nodes are permitted access to those endpoints.
* Node reboot: Unless node disruption policies are configured, applying the `MachineConfig` CR with the required firewall settings causes a node reboot.
Be aware of this impact and schedule a maintenance window accordingly. For more information, see "Using node disruption policies to minimize disruption from machine config changes".
Be aware of this impact and schedule a maintenance window accordingly.
+
[NOTE]
====
Node disruption policies are available in {product-title} 4.17 and later.
====

* Network flow matrix: For more information about managing ingress traffic, see {product-title} network flow matrix.
* Network flow matrix: For more information about managing ingress traffic, see "Network flow matrix".
You can restrict ingress traffic to essential flows to improve network security.
The matrix provides insights into base cluster services but excludes traffic generated by Day-2 Operators.
The matrix provides insights into base cluster services but excludes traffic generated by Day 2 Operators.

* Cluster version updates and upgrades: Exercise caution when updating or upgrading {product-title} clusters.
Recent changes to the platform's firewall requirements might require adjustments to network port permissions.
Expand Down
37 changes: 16 additions & 21 deletions modules/telco-core-software-stack.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,7 @@
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc

:_mod-docs-content-type: REFERENCE
// Module included in the following assemblies:
//
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc

:_mod-docs-content-type: REFERENCE
[id="telco-core-software-stack_{context}_{context}"]
[id="telco-core-software-stack_{context}"]
= Telco core reference configuration software specifications

[role="_abstract"]
Expand All @@ -20,36 +15,36 @@ The Red{nbsp}Hat telco core {product-version} solution has been validated using
|Component |Software version

|{rh-rhacm-first}
|2.15
|2.17

|{gitops-title}
|1.19
|1.20

|cert-manager Operator
|1.18
|1.19

|Cluster Logging Operator
|6.2
|6.5

|{rh-storage}
|4.20
|4.22

|SR-IOV Network Operator
|4.21
|4.22

|MetalLB
|4.21
|4.22

|NMState Operator
|4.21
|4.22

|NUMA-aware scheduler
|4.21
|4.22
|====

* {rh-rhacm-first} will be updated to 2.16 when the aligned {rh-rhacm-first} version is released.
* {rh-storage} will be updated to 4.21 when the aligned {rh-storage} version is released.
* The cert-manager Operator and {gitops-title} Operator are platform agnostic operators.
The support lifecycle for these operators is independent from the support lifecycle for {product-title}.
You might need to update to a newer minor version of these operators at the end of an operator lifecycle, or when planning to update the {product-title} cluster to continue support.
For support lifecycle details for platform agnostic operators, see link:https://access.redhat.com/support/policy/updates/openshift_operators[OpenShift Operator Life Cycles].
* {rh-storage} is expected to be updated to 4.22 when the aligned {rh-storage} version is released.
* The Cluster Logging Operator is expected to be updated to 6.6 when the aligned version is released.
* The cert-manager Operator and {gitops-title} Operator are platform-agnostic operators.
The support lifecycle for these operators is independent from the support lifecycle for {product-title}.
You might need to update to a newer minor version of these operators at the end of an operator lifecycle, or when planning to update the {product-title} cluster to continue support.
For support lifecycle details for platform-agnostic operators, see link:https://access.redhat.com/support/policy/updates/openshift_operators[OpenShift Operator Life Cycles].
5 changes: 3 additions & 2 deletions modules/telco-core-topology-aware-lifecycle-manager.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ When you unpause the `mcp` CR, all the configuration changes are applied with a
During installation, custom `mcp` CRs can be paused along with setting `maxUnavailable` to 100% to improve installation times.
====

* Orchestration of an upgrade, including {product-title}, day-2 OLM operators, and custom configuration can be done using a `ClusterGroupUpgrade` (CGU) CR containing policies describing these updates.
** An EUS to EUS upgrade can be orchestrated using chained CGU CRs
* Orchestration of an upgrade, including {product-title}, Day 2 OLM operators, and custom configuration can be done using a `ClusterGroupUpgrade` (CGU) CR containing policies describing these updates.
** An EUS to EUS upgrade can be orchestrated using chained CGU CRs.
** Control of MCP pause can be managed through policy in the CGU CRs for a full control plane and worker node rollout of upgrades.
** For more information, see "Performing an EUS-to-EUS update for telco core clusters".