Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/dictionary/en-custom.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
AES
APIs
Amartya
AssignedTeam
Expand Down Expand Up @@ -227,6 +228,7 @@ fsid
fultonj
fusco
fwcybtb
Galera
gapped
genericcloud
genindex
Expand Down Expand Up @@ -418,6 +420,7 @@ num
nvme
nwy
nzgdh
OADP
oauth
observability
oc
Expand Down Expand Up @@ -497,6 +500,8 @@ psathyan
pubkey
publicdomain
pullsecret
PVC
PVCs
pvs
pwd
pxe
Expand Down Expand Up @@ -573,6 +578,7 @@ sso
stateful
stderr
stdout
StorageClass
stp
str
stricthostkeychecking
Expand Down Expand Up @@ -635,6 +641,7 @@ vcpus
vda
venv
vexxhost
Velero
virbr
virsh
virt
Expand All @@ -659,6 +666,7 @@ vvvv
vxlan
vynxgdagahaac
vzcg
WaitForFirstConsumer
websso
wget
whitebox
Expand Down
68 changes: 68 additions & 0 deletions playbooks/backup_restore.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
# End-to-end backup/restore test playbook
#
# Aligns with the openstack-k8s-operators backup-restore user guide (Galera,
# optional OVN NB/SB on PVC, OADP, ordered restore, Neutron–OVN sync post-EDPM).
#
# Used standalone or from post-deployment.yml (gated by
# cifmw_run_backup_restore_test). Logic lives in
# roles/cifmw_backup_restore/tasks/e2e.yml; variables are in the role defaults.
#
# Each step can be enabled/disabled independently for iterative testing.
#
# Prerequisites:
# - OpenStack control plane deployed and healthy
# - OpenStackBackupConfig CR created (for backup labeling)
# - For manual testing on a reproducer, run post_deployment.sh first:
# ./post_deployment.sh -e zuul_log_collection=true \
# -e cifmw_nolog=false -e cifmw_run_tests=false
#
# Manual usage (reproducer):
# COMMON_ARGS="-i ~/ci-framework-data/artifacts/zuul_inventory.yml \
# -e @~/ci-framework-data/parameters/reproducer-variables.yml \
# -e @~/ci-framework-data/parameters/openshift-environment.yml"
#
# # Full run (with test workload):
# ansible-playbook $COMMON_ARGS playbooks/backup_restore.yaml \
# -e cifmw_backup_restore_create_workload=true
#
# # Full run (without workload):
# ansible-playbook $COMMON_ARGS playbooks/backup_restore.yaml
#
# # Install deps only:
# ansible-playbook $COMMON_ARGS playbooks/backup_restore.yaml \
# -e cifmw_backup_restore_run_backup=false \
# -e cifmw_backup_restore_run_cleanup=false \
# -e cifmw_backup_restore_run_restore=false
#
# # Backup only (deps already installed):
# ansible-playbook $COMMON_ARGS playbooks/backup_restore.yaml \
# -e cifmw_backup_restore_install_deps=false \
# -e cifmw_backup_restore_run_cleanup=false \
# -e cifmw_backup_restore_run_restore=false
#
# # Cleanup + restore (backup already done):
# ansible-playbook $COMMON_ARGS playbooks/backup_restore.yaml \
# -e cifmw_backup_restore_install_deps=false \
# -e cifmw_backup_restore_run_backup=false \
# -e cifmw_backup_restore_backup_timestamp=20260323-144546
#
# # Restore only (cleanup already done):
# ansible-playbook $COMMON_ARGS playbooks/backup_restore.yaml \
# -e cifmw_backup_restore_install_deps=false \
# -e cifmw_backup_restore_run_backup=false \
# -e cifmw_backup_restore_run_cleanup=false \
# -e cifmw_backup_restore_backup_timestamp=20260323-144546
#
# # With PVC pinning (WaitForFirstConsumer storage):
# ansible-playbook $COMMON_ARGS playbooks/backup_restore.yaml \
# -e cifmw_backup_restore_pin_pvcs=true

- name: Backup and Restore end-to-end test
hosts: "{{ cifmw_target_host | default('localhost') }}"
gather_facts: true
tasks:
- name: Run backup/restore end-to-end orchestration
ansible.builtin.import_role:
name: cifmw_backup_restore
tasks_from: e2e.yml
12 changes: 12 additions & 0 deletions post-deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,18 @@
tags:
- compliance

- name: Run backup and restore test
hosts: "{{ cifmw_target_host | default('localhost') }}"
gather_facts: true
tasks:
- name: Run backup/restore end-to-end orchestration
ansible.builtin.import_role:
name: cifmw_backup_restore
tasks_from: e2e.yml
when: cifmw_run_backup_restore_test | default(false) | bool
tags:
- backup-restore

- name: Run hooks and inject status flag
hosts: "{{ cifmw_target_host | default('localhost') }}"
gather_facts: true
Expand Down
104 changes: 104 additions & 0 deletions roles/cifmw_backup_restore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# cifmw_backup_restore

Automate OpenStack on OpenShift backup and restore operations using OADP
(OpenShift API for Data Protection) and Velero. The role supports three
actions: **backup**, **restore**, and **cleanup**.

- **backup** — creates Galera database dumps, optionally backs up OVN NB/SB
databases onto their PVCs, then creates Velero backups of labeled PVCs
(via CSI snapshots) and cluster resources.
- **restore** — performs an ordered Velero restore sequence (PVCs,
foundation, infrastructure, control plane, Galera, optional OVN file restore,
full control plane resume, dataplane, EDPM), then Neutron–OVN verification and
sync (**log** mode, then **repair**, matching the backup-restore user guide Step 12).
- **cleanup** — tears down dataplane and control-plane resources so the
namespace is ready for a fresh restore.

## Privilege escalation

None. All cluster operations are performed through `oc` against the target
OpenShift cluster.

## Parameters

### Common

* `cifmw_backup_restore_action`: (String) Action to perform. Must be one of `backup`, `restore`, or `cleanup`. Defaults to `""` (role will fail if unset).
* `cifmw_backup_restore_namespace`: (String) Target OpenStack namespace. Defaults to `openstack`.
* `cifmw_backup_restore_oadp_namespace`: (String) Namespace where Velero/OADP is running. Defaults to `openshift-adp`.
* `cifmw_backup_restore_auto_ack`: (Boolean) Skip interactive pause prompts when `true`. Defaults to `false`.
* `cifmw_backup_restore_ovn_db`: (Boolean) When `true` (default), the **backup** path labels OVN NB/SB PVCs and runs `ovsdb-client` backup before the OADP PVC backup, and the **restore** path runs OVN NB/SB file restore after Galera (when timestamped files exist on the PVC) before resuming the full control plane. Set to `false` to skip both; post-EDPM `neutron-ovn-db-sync` still runs when OVN files were not backed up.
* `cifmw_backup_restore_ovn_db_ready_timeout`: (String) Timeout for `oc wait` on OVN database pods during OVN backup/restore. Defaults to `5m`.

### Backup

* `cifmw_backup_restore_galera_backup_timeout`: (String) Timeout for `oc wait` on Galera backup jobs. Defaults to `10m`.
* `cifmw_backup_restore_galera_storage_class`: (String) StorageClass for Galera backup PVCs. Empty string uses the cluster default. Defaults to `""`.
* `cifmw_backup_restore_galera_storage_request`: (String) Size of the Galera backup PVC. Defaults to `5Gi`.
* `cifmw_backup_restore_galera_transfer_storage_request`: (String) Size of the Galera transfer storage PVC. Defaults to `5Gi`.
* `cifmw_backup_restore_oadp_backup_timeout`: (String) Timeout for OADP PVC and resource backup completion. Defaults to `30m`.
* `cifmw_backup_restore_storage_location`: (String) Velero `BackupStorageLocation` name. Defaults to `velero-1`.
* `cifmw_backup_restore_backup_ttl`: (String) TTL for Velero backups. Defaults to `720h`.
* `cifmw_backup_restore_snapshot_move_data`: (Boolean) Enable Velero snapshot data mover. When `true`, cleanup also deletes labeled PVCs. Defaults to `true`.

### Restore

* `cifmw_backup_restore_backup_timestamp`: (String) Timestamp suffix that identifies the backup to restore (e.g. `20260311-081234`). **Required** when `cifmw_backup_restore_action` is `restore`.
* `cifmw_backup_restore_restore_timeout`: (Integer) Seconds to wait for each Velero Restore to reach a terminal phase. Defaults to `900`.
* `cifmw_backup_restore_infra_ready_timeout`: (String) Timeout for `oc wait` on `OpenStackControlPlaneInfrastructureReady`. Defaults to `20m`.
* `cifmw_backup_restore_ctlplane_ready_timeout`: (String) Timeout for `oc wait` on control plane `Ready` after removing the deployment-stage annotation. Defaults to `10m`.
* `cifmw_backup_restore_strict_restore`: (Boolean) Fail on Velero `PartiallyFailed` status when `true`; only warn when `false`. Defaults to `true`.
* `cifmw_backup_restore_restore_content`: (String) Content flag passed to `restore_galera` (`--content`). Defaults to `data`.
* `cifmw_backup_restore_edpm_deploy_timeout`: (String) Timeout for `oc wait` on the post-restore EDPM deployment. Defaults to `40m`.
* `cifmw_backup_restore_pin_pvcs`: (Boolean) Enable PVC-to-node pinning during restore for WaitForFirstConsumer storage classes. Defaults to `false`.
* Post-EDPM **Neutron–OVN** steps follow [user guide Step 12](https://github.com/openstack-k8s-operators/dev-docs/blob/main/backup-restore/user-guide.md#step-12-verify-and-sync-neutron-to-ovn): run `neutron-ovn-db-sync-util` in `log` mode first (`neutron-dist.conf`, `neutron.conf`, `neutron.conf.d`). **Repair** runs if `cifmw_backup_restore_ovn_db` is `false` (no OVN NB/SB file backup was taken), or if log-mode stdout/stderr contains a `WARNING` line—Neutron reports drift that way while still exiting 0. If OVN file backup/restore was enabled and log output has no `WARNING` lines, repair is skipped as redundant.

### Cleanup

* `cifmw_backup_restore_cleanup_ctlplane`: (Boolean) Delete control-plane resources during cleanup. Defaults to `true`.
* `cifmw_backup_restore_cleanup_dataplane`: (Boolean) Delete dataplane resources during cleanup. Defaults to `true`.

## Examples

### Running a backup

```YAML
- hosts: localhost
tasks:
- name: Backup OpenStack
ansible.builtin.include_role:
name: cifmw_backup_restore
vars:
cifmw_backup_restore_action: backup
cifmw_backup_restore_namespace: openstack
cifmw_backup_restore_auto_ack: true
```

### Restoring from a backup

```YAML
- hosts: localhost
tasks:
- name: Restore OpenStack
ansible.builtin.include_role:
name: cifmw_backup_restore
vars:
cifmw_backup_restore_action: restore
cifmw_backup_restore_backup_timestamp: "20260311-081234"
cifmw_backup_restore_auto_ack: true
```

### Cleaning up before a restore

```YAML
- hosts: localhost
tasks:
- name: Cleanup namespace
ansible.builtin.include_role:
name: cifmw_backup_restore
vars:
cifmw_backup_restore_action: cleanup
cifmw_backup_restore_auto_ack: true
cifmw_backup_restore_cleanup_ctlplane: true
cifmw_backup_restore_cleanup_dataplane: true
```
76 changes: 76 additions & 0 deletions roles/cifmw_backup_restore/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
# Copyright Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


# All variables intended for modification should be placed in this file.
# All variables within this role should have a prefix of "cifmw_backup_restore"

# Action to perform: backup, restore, or cleanup
cifmw_backup_restore_action: ""

# Common
cifmw_backup_restore_namespace: "{{ cifmw_openstack_namespace | default('openstack') }}"
cifmw_backup_restore_oadp_namespace: openshift-adp
cifmw_backup_restore_auto_ack: false

# End-to-end orchestration (tasks/e2e.yml; invoked from post-deployment or playbooks/backup_restore.yaml)
cifmw_backup_restore_install_deps: true
cifmw_backup_restore_create_workload: true
cifmw_backup_restore_run_backup: true
cifmw_backup_restore_run_cleanup: true
cifmw_backup_restore_run_restore: true
cifmw_backup_restore_run_post_tempest: false

# Passthrough to update role when creating the test workload (prefix matches update role, not this role)
cifmw_update_ping_test: true
cifmw_update_control_plane_check: false
cifmw_update_artifacts_basedir_suffix: "tests/update"
cifmw_update_artifacts_basedir: "{{ ansible_user_dir }}/ci-framework-data/{{ cifmw_update_artifacts_basedir_suffix }}"
cifmw_update_workload_launch_script: "{{ cifmw_update_artifacts_basedir }}/workload_launch.sh"
cifmw_update_timestamper_cmd: >-
| awk '{ print strftime("%Y-%m-%d %H:%M:%S |"), $0; fflush(); }'
cifmw_update_ping_start_script: "{{ cifmw_update_artifacts_basedir }}/l3_agent_start_ping.sh"
cifmw_update_ping_stop_script: "{{ cifmw_update_artifacts_basedir }}/l3_agent_stop_ping.sh"
cifmw_update_namespace: "{{ cifmw_backup_restore_namespace }}"

# Backup
cifmw_backup_restore_galera_backup_timeout: 10m
cifmw_backup_restore_galera_storage_class: ""
cifmw_backup_restore_galera_storage_request: 5Gi
cifmw_backup_restore_galera_transfer_storage_request: 5Gi
cifmw_backup_restore_oadp_backup_timeout: 30m
cifmw_backup_restore_storage_location: velero-1
cifmw_backup_restore_backup_ttl: 720h
cifmw_backup_restore_snapshot_move_data: true
cifmw_backup_restore_swift_xattr_timeout: 600s

# OVN NB/SB database files on PVCs (user-guide backup Step 3 / restore Step 8)
cifmw_backup_restore_ovn_db: true
cifmw_backup_restore_ovn_db_ready_timeout: 5m

# Restore
# cifmw_backup_restore_backup_timestamp: REQUIRED for restore (e.g., 20260311-081234)
cifmw_backup_restore_restore_timeout: 900
cifmw_backup_restore_edpm_deploy_timeout: 40m
cifmw_backup_restore_infra_ready_timeout: 20m
cifmw_backup_restore_ctlplane_ready_timeout: 10m
cifmw_backup_restore_strict_restore: true
cifmw_backup_restore_restore_content: data
cifmw_backup_restore_pin_pvcs: false

# Cleanup
cifmw_backup_restore_cleanup_ctlplane: true
cifmw_backup_restore_cleanup_dataplane: true
31 changes: 31 additions & 0 deletions roles/cifmw_backup_restore/meta/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
# Copyright Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


galaxy_info:
author: CI Framework
description: CI Framework Role -- OpenStack Backup and Restore
company: Red Hat
license: Apache-2.0
min_ansible_version: "2.14"
namespace: cifmw
galaxy_tags:
- cifmw
- openstack
- backup
- restore

dependencies: []
Loading
Loading