Skip to content

Conversation

@mnietoji
Copy link
Contributor

@mnietoji mnietoji commented Nov 7, 2025

[multiple] Add fdp_update_edpm role for EDPM node updates

[fdp_update_edpm] Implement EDPM node update automation for FDP updates:

  • Role fdp_update_edpm: Updates EDPM nodes declaratively via Kubernetes CRs
    • Patches OpenStackDataPlaneNodeSet CRs with updated container images
    • Configures package updates via edpm_bootstrap_packages
    • Sets up registry authentication and CA certificates
    • Creates OpenStackDataPlaneDeployment to apply changes
    • Includes hypervisor firewall configuration for registry access
  • Fix hypervisor firewall configuration
    • Add delegate_to to execute iptables on correct hypervisor host
    • Previously executed on localhost instead of hypervisor
  • Integration in post-deployment.yml after control plane updates
  • Zuul CI configuration for automated testing

[fdp_update_container_images] Fix to properly update OpenStackVersion CR

  • Add set_fact task to build customContainerImages dict correctly

Assisted-By: Claude noreply@anthropic.com
Signed-off-by: Miguel Angel Nieto Jimenez mnietoji@redhat.com

Copy link
Contributor

@michburk michburk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this pr isn't split properly and also contains the changes from #3466

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/381a8462756e461fa40fdad368f58ea3

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 12s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 14s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 30m 53s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 27s
✔️ noop SUCCESS in 0s
cifmw-pod-ansible-test TIMED_OUT in 30m 56s
cifmw-pod-pre-commit FAILURE in 7m 51s
✔️ build-push-container-cifmw-client SUCCESS in 18m 58s
✔️ cifmw-molecule-fdp_update_container_images SUCCESS in 2m 18s
✔️ cifmw-molecule-fdp_update_edpm SUCCESS in 2m 22s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8dae75330b79473b922c4f394be22509

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 38m 24s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 49s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 24m 29s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 38s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 10s
cifmw-pod-pre-commit FAILURE in 7m 49s
✔️ build-push-container-cifmw-client SUCCESS in 19m 15s
✔️ cifmw-molecule-fdp_update_container_images SUCCESS in 2m 19s
✔️ cifmw-molecule-fdp_update_edpm SUCCESS in 2m 17s

@mnietoji mnietoji force-pushed the fdp_update_edpm branch 2 times, most recently from d4a6725 to 6410812 Compare November 7, 2025 23:39
@mnietoji
Copy link
Contributor Author

mnietoji commented Nov 7, 2025

Looks like this pr isn't split properly and also contains the changes from #3466

This pr depends on the other one. When the other one is merged, i will rebase this one and I will only have one commit here

@mnietoji mnietoji force-pushed the fdp_update_edpm branch 2 times, most recently from f09d329 to 2f1a170 Compare November 7, 2025 23:52
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a31750715d0943f7889539e20267c60d

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 42m 49s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 17m 33s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 26m 30s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 36s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 52s
cifmw-pod-pre-commit FAILURE in 7m 58s
✔️ build-push-container-cifmw-client SUCCESS in 22m 17s
✔️ cifmw-molecule-fdp_update_container_images SUCCESS in 1m 44s
✔️ cifmw-molecule-fdp_update_edpm SUCCESS in 1m 52s

# ============================================================================

# Base directory for artifacts and temporary files
cifmw_fdp_update_container_images_basedir: "{{ cifmw_basedir | default(ansible_user_dir ~ '/ci-framework-data') }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove default. It should start using group_vars/all.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdp_update_container_images not in this PR, this was managed in a different PR


# Repository configuration
cifmw_fdp_update_container_images_repo_name: "custom-repo"
cifmw_fdp_update_container_images_repo_baseurl: "" # REQUIRED - must be set by user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit: do we really want to have so much parameters available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdp_update_container_images not in this PR, this was managed in a different PR

@@ -0,0 +1,65 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure we need that task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we want to download images from openshift registry. That registry was not used before by the compute node. We need to either configure it as a insecure registry or configure the certificate in the compute node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Current CI job workflow seems not "catching" into your exception. Don't think it is needed, but if yes, would be surprise and probably you will not use ci-framework tool for preparing CI jobs.

Copy link
Contributor Author

@mnietoji mnietoji Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edpm nodes are downloading modified images from openshift registry. It is needed to configure certificate or configure the registry as insecure.
This is needed because images modified are stored in openshift registry. For regular deployment, images are downloaded from external registries and it is configured the certificates needed to access those registries or it is configured as insecure.

_cifmw_fdp_update_edpm_deployment_result.resources | length > 0 and
_cifmw_fdp_update_edpm_deployment_result.resources[0].status.conditions |
selectattr('type', 'equalto', 'Ready') |
map(attribute='status') | first | default('False') == 'True'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is there in case the filter first return empty, so It set it to False

@@ -0,0 +1,59 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need that task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is needed because our compute nodes does not have connectivity to the openshift registry in which we store the updated images. I only add a route so that compute nodes can access registry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i commented before the reason
0c3f13a#r2592543239

register: _cifmw_fdp_update_edpm_current_logins_result
failed_when: false

- name: Parse existing registry logins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fact and later using this fact is just: xDˣᴰ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a way it is easier to understand

Copy link
Contributor

@danpawlik danpawlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look what AI generates you back.
Also some part of the code was already done in fdp_update_container_images.

state: build

- name: Build custom container images dict
ansible.builtin.set_fact:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I'm unsure why this change is required.

when:
- cifmw_fdp_update_edpm_auto_deploy | bool
- not cifmw_fdp_update_edpm_dry_run | bool
- _cifmw_fdp_update_edpm_updated_nodesets | default([]) | length > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed not required. We've already added the default to the default role. And it'd not be te way to import this task instead importing this role.

- name: Patch container images in nodeset
when:
- cifmw_fdp_update_edpm_containers_enabled | bool
- _cifmw_fdp_update_edpm_updated_images is defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed is not necessary, as this is defined in the defualts of the role, and it doesn't seems that this task is included outside the natural include_role process. So then this var is always defined.


- name: Add new repo to repos list
ansible.builtin.set_fact:
_cifmw_fdp_update_edpm_merged_repos: "{{ _cifmw_fdp_update_edpm_bootstrap_repos_list + [_cifmw_fdp_update_edpm_new_repo] }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) question: is required this to be a fact? I don't see it used anywhere outside this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used several times in the file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: let's move into a variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is a non-blocking issue, i can check these kind of improvementes later on in a different PR

Copy link
Contributor Author

@mnietoji mnietoji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked most of the comments, i will continue tomorrow

state: build

- name: Build custom container images dict
ansible.builtin.set_fact:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is happening with current code without set_fact. I added a debug task before to show the content of the variable.

TASK [fdp_update_container_images : Debug - Show new image path internal msg=New image path (internal): {{ _cifmw_fdp_update_container_images_new_image_path_internal }}] ***
Wednesday 10 December 2025 14:38:01 +0000 (0:00:01.774) 0:00:26.053 ****
Wednesday 10 December 2025 14:38:01 +0000 (0:00:01.774) 0:00:26.052 ****
ok: [localhost] =>
msg: 'New image path (internal): image-registry.openshift-image-registry.svc:5000/openstack/fdp-update-ceilometersgcoreimage-1765377456'

TASK [fdp_update_container_images : Patch OpenStackVersion CR state=patched, api_version=core.openstack.org/v1beta1, kind=OpenStackVersion, name={{ cifmw_fdp_update_container_images_openstack_cr_name }}, namespace={{ cifmw_fdp_update_container_images_namespace }}, definition={'spec': {'customContainerImages': {'{{ image_entry.key }}': '{{ _cifmw_fdp_update_container_images_new_image_path_internal }}'}}}] ***
Wednesday 10 December 2025 14:38:01 +0000 (0:00:00.037) 0:00:26.091 ****
Wednesday 10 December 2025 14:38:01 +0000 (0:00:00.037) 0:00:26.089 ****
[WARNING]: unknown field "spec.customContainerImages.{{ image_entry.key }}"
ok: [localhost]

The problem is that Ansible doesn't evaluate {{ image_entry.key }} when used directly as a YAML dictionary key. It sends the literal string "{{ image_entry.key }}" to Kubernetes instead of the actual value like ceilometerSgcoreImage. The solution is to build the dictionary as a string template first, then convert it with from_yaml filter - this forces proper variable evaluation before creating the dictionary.

Instead of using the set_fact, i have changed to this other sintax:
customContainerImages: "{{ { image_entry.key: _cifmw_fdp_update_container_images_new_image_path_internal } }}"

# Dry run - show changes without applying
cifmw_fdp_update_edpm_dry_run: false

# ============================================
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdp_update_container_images patches images with ovn in FDP repo and uploads them to the openshift registry. It also updates openstackversion to force pods using those images to be updated.
fdp_update_edpm uses images already created by fdp_update_container_images. The goal of this role is to update images in compute nodes (they are not pods, just containers). It also update ovs packages in compute host

@@ -0,0 +1,62 @@
---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This role is updating containers running in compute nodes. Images are stored in openshift registry. I could have chosen to use any external registry to store that images, but I liked the idea of not depending on any other external resources and I do not need to keep those images, so openshift registry is nice.
But compute nodes are not connected to openshift baremetal network. One solution may have been to add a new vlan in compute nodes, configure the switch and allow compute nodes to access baremetal network, but i didnt want to have a modification on the templates only because i need it for testing.
What I am doing is to add some firewall rules in hypervisor to access openshift baremetal network from compute nodes using controlplane network


- name: Fail if no NodeSets found
ansible.builtin.fail:
msg: "No OpenStackDataPlaneNodeSets found in namespace {{ cifmw_fdp_update_edpm_namespace }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it make sense:
if cifmw_fdp_update_edpm_nodeset_name == 'all' executes: oc get osdpns
if cifmw_fdp_update_edpm_nodeset_name != 'all' executes: oc get osdpns cifmw_fdp_update_edpm_nodeset_name

In both cases, if there is no nodeset created, _cifmw_fdp_update_edpm_nodesets will be empty

# ============================================

- name: Setup hypervisor firewall for registry access
ansible.builtin.include_tasks: setup_hypervisor_firewall.yml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i commented before the reason
0c3f13a#r2592543239


- name: Add new repo to repos list
ansible.builtin.set_fact:
_cifmw_fdp_update_edpm_merged_repos: "{{ _cifmw_fdp_update_edpm_bootstrap_repos_list + [_cifmw_fdp_update_edpm_new_repo] }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used several times in the file

that:
- cifmw_fdp_update_edpm_repo_baseurl is defined
- cifmw_fdp_update_edpm_repo_baseurl | length > 0
- cifmw_fdp_update_edpm_packages is defined
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


- name: Initialize updated nodesets tracking list
ansible.builtin.set_fact:
_cifmw_fdp_update_edpm_updated_nodesets: []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?
it is created here empty and later on it is updated for each nodeset and and is used to decide if a deployment is created or not and also for the debug message at the end

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/32e75138ecb8467586f36758556b4c4f

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 55s
podified-multinode-edpm-deployment-crc FAILURE in 36m 41s
cifmw-crc-podified-edpm-baremetal FAILURE in 42m 19s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 1h 23m 36s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 36s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 41s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 11s
✔️ build-push-container-cifmw-client SUCCESS in 21m 56s
✔️ cifmw-molecule-fdp_update_container_images SUCCESS in 2m 19s
✔️ cifmw-molecule-fdp_update_edpm SUCCESS in 2m 24s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/7bdf595fca3c4cf6adb0786fddcda0f5

openstack-k8s-operators-content-provider FAILURE in 7m 11s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 08s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 06s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 59s
✔️ build-push-container-cifmw-client SUCCESS in 19m 47s
✔️ cifmw-molecule-fdp_update_container_images SUCCESS in 2m 26s
✔️ cifmw-molecule-fdp_update_edpm SUCCESS in 2m 26s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d03970aa66cb4aa2b09e0d3a4c74d2e4

openstack-k8s-operators-content-provider FAILURE in 5m 23s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 31s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 30s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 01s
✔️ build-push-container-cifmw-client SUCCESS in 18m 07s
✔️ cifmw-molecule-fdp_update_container_images SUCCESS in 2m 17s
✔️ cifmw-molecule-fdp_update_edpm SUCCESS in 2m 20s

@mnietoji
Copy link
Contributor Author

Please take a look what AI generates you back. Also some part of the code was already done in fdp_update_container_images.

Not same code, maybe name of the file is the same but it is doing it a different thing

[fdp_update_edpm] Implement EDPM node update automation for FDP updates:
- Role fdp_update_edpm: Updates EDPM nodes declaratively via Kubernetes CRs
  * Patches OpenStackDataPlaneNodeSet CRs with updated container images
  * Configures package updates via edpm_bootstrap_packages
  * Sets up registry authentication and CA certificates
  * Creates OpenStackDataPlaneDeployment to apply changes
  * Includes hypervisor firewall configuration for registry access
- Fix hypervisor firewall configuration
  * Add delegate_to to execute iptables on correct hypervisor host
  * Previously executed on localhost instead of hypervisor
- Integration in post-deployment.yml after control plane updates
- Zuul CI configuration for automated testing

[fdp_update_container_images] Fix to properly update OpenStackVersion CR
  * Add set_fact task to build customContainerImages dict correctly

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
@mnietoji
Copy link
Contributor Author

Hi, have rebased the patch and gone through alll pending comments. Testing is green again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants