-
Notifications
You must be signed in to change notification settings - Fork 1.9k
OSDOCS-19006: Add Azure procedure for migrating x86 control plane to arm64 #112335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bshaw7
wants to merge
1
commit into
openshift:enterprise-4.20
Choose a base branch
from
bshaw7:azure-x86-to-arm64-cp-migration-4.20
base: enterprise-4.20
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+281
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,269 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * updating/updating_a_cluster/migrating-to-multi-payload.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="migrating-from-x86-to-arm64-cp-azure_{context}"] | ||
| = Migrating the x86 control plane to arm64 architecture on {azure-full} | ||
|
|
||
| [role="_abstract"] | ||
| You can migrate the control plane in your cluster from `x86` to `arm64` architecture on {azure-first}. Migrating to `arm64` control plane nodes can reduce cloud infrastructure costs and improve energy efficiency while maintaining the same cluster functionality. | ||
|
|
||
| {azure-short} requires you to manually create a gallery image from the `arm64` {op-system-first} VHD before updating the control plane machine set. | ||
|
|
||
| .Prerequisites | ||
|
|
||
| * You have installed the {oc-first}. | ||
| * You logged in to `oc` as a user with `cluster-admin` privileges. | ||
| * You have installed the Azure CLI (`az`). | ||
| * You are logged in to the Azure CLI with an account that has permissions to create resources in the cluster's resource group. | ||
|
|
||
| .Procedure | ||
|
|
||
| . Check the architecture of the control plane nodes by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get nodes -o wide | ||
| ---- | ||
| + | ||
| .Example output | ||
| [source,terminal] | ||
| ---- | ||
| NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME | ||
| worker-001.example.com Ready worker 100d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.x86_64 cri-o://1.30.x | ||
| worker-002.example.com Ready worker 98d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.x86_64 cri-o://1.30.x | ||
| master-001.example.com Ready control-plane,master 120d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.x86_64 cri-o://1.30.x | ||
| master-002.example.com Ready control-plane,master 120d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.x86_64 cri-o://1.30.x | ||
| master-003.example.com Ready control-plane,master 120d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.x86_64 cri-o://1.30.x | ||
| ---- | ||
| + | ||
| The `KERNEL-VERSION` field in the output indicates the architecture of the nodes. | ||
|
|
||
| . Check that your cluster is multi-architecture compatible by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc adm release info -o jsonpath="{ .metadata.metadata}" | ||
| ---- | ||
| + | ||
| If you see the following output, the cluster is multi-architecture compatible. | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| { | ||
| "release.openshift.io/architecture": "multi", | ||
| "url": "https://access.redhat.com/errata/<errata_version>" | ||
| } | ||
| ---- | ||
| + | ||
| If the value of the `release.openshift.io/architecture` field is not `multi`, migrate the cluster to a multi-architecture cluster. For more information, see "Migrating to a cluster with multi-architecture compute machines using the CLI". | ||
|
|
||
| . Update your image stream from single-architecture to multi-architecture by running the following command: | ||
| + | ||
| -- | ||
| include::snippets/update-image-stream-to-multi-arch.adoc[] | ||
| -- | ||
|
|
||
| . Set the infrastructure ID environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ INFRA_ID=$(oc get infrastructure cluster -o jsonpath='{.status.infrastructureName}') | ||
| ---- | ||
|
|
||
| . Set the region environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ REGION=$(oc get machines -n openshift-machine-api -o jsonpath='{.items[0].spec.providerSpec.value.location}') | ||
| ---- | ||
|
|
||
| . Set the resource group environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ RESOURCE_GROUP="${INFRA_ID}-rg" | ||
| ---- | ||
|
|
||
| . Set the storage account name environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ STORAGE_ACCOUNT_NAME=$(az storage account list --resource-group ${RESOURCE_GROUP} --query "[?ends_with(name, 'sa')].name" -o tsv) | ||
| ---- | ||
|
|
||
| . Set the gallery name environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ GALLERY_NAME=$(az sig list --resource-group ${RESOURCE_GROUP} --query "[].name" -o tsv) | ||
| ---- | ||
|
|
||
| . Set the `arm64` {op-system} VHD URL environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ VHD_URL=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages \ | ||
| -o jsonpath='{.data.stream}' | jq -r \ | ||
| '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".url') | ||
| ---- | ||
|
|
||
| . Set the VHD release environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ VHD_RELEASE=$(oc -n openshift-machine-config-operator get configmap/coreos-bootimages \ | ||
| -o jsonpath='{.data.stream}' | jq -r \ | ||
| '.architectures.aarch64."rhel-coreos-extensions"."azure-disk".release') | ||
| ---- | ||
|
|
||
| . Set the blob name environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ BLOB_NAME="rhcos-${VHD_RELEASE}-azure.aarch64.vhd" | ||
| ---- | ||
|
|
||
| . Set the storage account key environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ ACCOUNT_KEY=$(az storage account keys list \ | ||
| -g ${RESOURCE_GROUP} \ | ||
| --account-name ${STORAGE_ACCOUNT_NAME} \ | ||
| --query "[0].value" -o tsv) | ||
| ---- | ||
|
|
||
| . Copy the `arm64` {op-system} VHD to the cluster's storage account by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ az storage blob copy start \ | ||
| --account-name ${STORAGE_ACCOUNT_NAME} \ | ||
| --account-key "${ACCOUNT_KEY}" \ | ||
| --source-uri "${VHD_URL}" \ | ||
| --destination-blob "${BLOB_NAME}" \ | ||
| --destination-container vhd | ||
| ---- | ||
| + | ||
| Monitor the copy progress by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ az storage blob show \ | ||
| -c vhd -n "${BLOB_NAME}" \ | ||
| --account-name ${STORAGE_ACCOUNT_NAME} \ | ||
| --account-key "${ACCOUNT_KEY}" \ | ||
| --query "{status:properties.copy.status, progress:properties.copy.progress}" \ | ||
| -o table | ||
| ---- | ||
| + | ||
| Wait until the status shows `success`. The VHD is approximately 17 GB and the copy typically takes about 5 minutes. | ||
|
|
||
| . Set the image definition name environment variable by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ IMAGE_DEFINITION_NAME=$(az sig image-definition list \ | ||
| --resource-group ${RESOURCE_GROUP} \ | ||
| --gallery-name ${GALLERY_NAME} \ | ||
| --query "[?contains(name,'-gen2')].name" -o tsv \ | ||
| | sed 's/-gen2/-aarch64-gen2/') | ||
| ---- | ||
|
|
||
| . Create an `arm64` image definition in the cluster's shared image gallery by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ az sig image-definition create \ | ||
| --resource-group ${RESOURCE_GROUP} \ | ||
| --gallery-name ${GALLERY_NAME} \ | ||
| --gallery-image-definition ${IMAGE_DEFINITION_NAME} \ | ||
| --publisher RedHat-gen2 \ | ||
| --offer rhcos-aarch64-gen2 \ | ||
| --sku gen2 \ | ||
| --os-type Linux \ | ||
| --architecture Arm64 \ | ||
| --hyper-v-generation V2 \ | ||
| -l ${REGION} | ||
| ---- | ||
|
|
||
| . Set the {op-system} VHD URL environment variable for the copied blob by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ RHCOS_VHD_URL=$(az storage blob url \ | ||
| --account-name ${STORAGE_ACCOUNT_NAME} \ | ||
| -c vhd -n "${BLOB_NAME}" -o tsv) | ||
| ---- | ||
|
|
||
| . Create a gallery image version from the copied VHD by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ az sig image-version create \ | ||
| --resource-group ${RESOURCE_GROUP} \ | ||
| --gallery-name ${GALLERY_NAME} \ | ||
| --gallery-image-definition ${IMAGE_DEFINITION_NAME} \ | ||
| --gallery-image-version 1.0.0 \ | ||
| --os-vhd-storage-account ${STORAGE_ACCOUNT_NAME} \ | ||
| --os-vhd-uri ${RHCOS_VHD_URL} \ | ||
| -l ${REGION} | ||
| ---- | ||
|
|
||
| . Set the resource ID environment variable for the newly created image version by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ RESOURCE_ID="/$(az sig image-version show \ | ||
| --resource-group ${RESOURCE_GROUP} \ | ||
| --gallery-name ${GALLERY_NAME} \ | ||
| --gallery-image-definition ${IMAGE_DEFINITION_NAME} \ | ||
| --gallery-image-version 1.0.0 \ | ||
| --query id -o tsv | cut -d'/' -f4-)" | ||
| ---- | ||
|
|
||
| . Update the control plane machine set to use the `arm64` image and an ARM-compatible VM size by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc patch controlplanemachineset.machine.openshift.io cluster \ | ||
| --type=json \ | ||
| -p "[ | ||
| {\"op\": \"replace\", \"path\": \"/spec/template/machines_v1beta1_machine_openshift_io/spec/providerSpec/value/image/resourceID\", \"value\": \"${RESOURCE_ID}\"}, | ||
| {\"op\": \"replace\", \"path\": \"/spec/template/machines_v1beta1_machine_openshift_io/spec/providerSpec/value/vmSize\", \"value\": \"<arm64_vm_size>\"} | ||
| ]" \ | ||
| -n openshift-machine-api | ||
| ---- | ||
| + | ||
| Replace `<arm64_vm_size>` with an ARM-compatible VM size, such as `Standard_D8ps_v6`. The `ps` suffix in {azure-short} VM sizes denotes Ampere Arm-based processors. For information about supported instance types, see "Tested instance types for {azure-short} on 64-bit ARM infrastructures". | ||
| + | ||
| For clusters that use the default `RollingUpdate` update strategy, the control plane machine set propagates changes to your control plane configuration automatically. The full rollout typically takes approximately 55 minutes for a 3-node control plane. During the rollout, `etcd` may report transient `Degraded` or `Progressing` conditions, which resolve after all control plane nodes are replaced. | ||
| + | ||
| For clusters that are configured to use the `OnDelete` update strategy, you must replace your control plane machines manually. | ||
| + | ||
| [NOTE] | ||
| ==== | ||
| If a replacement machine fails with an "instance missing" error, delete the failed machine to allow the control plane machine set to retry the replacement. | ||
| ==== | ||
|
|
||
| .Verification | ||
|
|
||
| * Verify that the control plane nodes are running on the `arm64` architecture by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get nodes -o wide | ||
| ---- | ||
| + | ||
| .Example output | ||
| [source,terminal] | ||
| ---- | ||
| NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME | ||
| worker-001.example.com Ready worker 100d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.x86_64 cri-o://1.30.x | ||
| worker-002.example.com Ready worker 98d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.x86_64 cri-o://1.30.x | ||
| master-001.example.com Ready control-plane,master 120d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.aarch64 cri-o://1.30.x | ||
| master-002.example.com Ready control-plane,master 120d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.aarch64 cri-o://1.30.x | ||
| master-003.example.com Ready control-plane,master 120d v1.30.7 10.x.x.x <none> Red Hat Enterprise Linux CoreOS 4xx.xx.xxxxx-0 5.x.x-xxx.x.x.el9_xx.aarch64 cri-o://1.30.x | ||
| ---- | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤖 [error] AsciiDocDITA.TaskInclude: The included file may introduce content that cannot be mapped to DITA steps.