Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions doc/content/design/secureboot-certificate-expiry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: Handling Microsoft Secure Boot Certificate Expiry
layout: default
design_doc: true
revision: 1
status: draft
---

## 1. Background

Microsoft Secure Boot certificates from 2011 are reaching end-of-life, and legacy VMs may still contain only the old certificate set. XenServer needs an out-of-band mechanism to update per-VM UEFI Secure Boot variables safely and at scale.

Scope of this design:

- Update certificate state tracking and update flow for VMs, snapshots, and templates
- Provide API support for scheduling certificate updates on VM boot
- Integrate xapi and varstored behavior for consistent state handling

## 2. System Overview

### 2.1 Out-of-band Update Mechanism

Certificate update is implemented as a dedicated API-driven workflow (not a plugin), so that:

- The interface is documented and SDK-generated
- RBAC can be assigned precisely
- xapi can route requests and coordinate host-side behavior consistently

### 2.2 Certificate State Tracking

A new VM field is introduced:

- `VM.secureboot_certificates_state` (enum, readonly)

States:

- `ok`: No update required (including non-applicable VM types)
- `update_available`: Update required
- `update_on_boot`: Update scheduled for next boot

~~~mermaid

stateDiagram
update_available --> update_on_boot : Admin marks VM for update
update_on_boot --> ok : VM boots, update succeeds
update_on_boot --> update_on_boot : VM boots, update fails(retain state)
ok --> update_available : recompute state(e.g. legacy VM import)

~~~

### 2.3 RBAC

The new update API follows VM-admin-level access, aligned with existing NVRAM-related VM operations.

## 3. Design for Components

### 3.1 VM Certificate State Model

`VM.secureboot_certificates_state` applies to these VM-class objects,

- VMs
- Snapshots
- Templates

Transition intent:

- Admin marks a VM for update: `update_available -> update_on_boot`
- VM boots and update succeeds: `update_on_boot -> ok`
- VM boots and update fails: remains `update_on_boot` or is reset to `update_available` based on update result handling

### 3.2 API: Mark/Unmark Update-on-Boot

New API:

- `VM.update_secureboot_certificates_on_boot(session, vm, mark)`

Behavior:

- `mark=true`: require current state `update_available`, then set `update_on_boot`
- `mark=false`: require current state `update_on_boot`, then set `update_available`

Validation:

- Reject invalid transitions with `OPERATION_NOT_ALLOWED`

### 3.3 DB Upgrade and Import Handling

On toolstack restart after upgrade:

- Initialize `secureboot_certificates_state` for all VM records to `ok`
- Re-evaluate NVRAM and set `update_available` where needed

Applied to:

- VMs
- Snapshots
- Non-default templates

Default templates remain `ok`.

For VM import and cross-pool migration:

- If imported metadata lacks `secureboot_certificates_state`, determine state from NVRAM and set it during import
- If imported metadata contains `secureboot_certificates_state`, reserve the state during import

### 3.4 NVRAM and State Consistency

The certificate state must stay consistent with actual NVRAM content.

Key interface change:

- Extend `VM.set_NVRAM_EFI_variables` with optional parameter `update`, we call it `VM.set_NVRAM_EFI_variables_V2`

Rules:

- `update=yes` -> set state `ok`
- `update=no` -> do not update state
- omitted -> xapi runs certificate check helper and derives state

This ensures compatibility when old varstored instances are still running during rolling update windows.

### 3.5 Certificate Check Helper

A standalone program will be introduced, which xapi calls to determine the SecureBoot cert state

Inputs:

- `temp file path` which contains NVRAM EFI-variables data

Behavior:

- This program comes to use some common functions shared with varstored.
- This program is launched by xapi, it is executed in a sandboxed and reduced privileges environment.
- Xapi retrieves VM's NVRAM content from database and passes it to this program via command-line arguments.
- If this program outputs `update_required`, xapi sets `VM.secureboot_certificates_state` to be `update_available`.
- If this program outputs `update_ok`, xapi sets `VM.secureboot_certificates_state` to be `ok`.
- On toolstack restart, during DB upgrade, this program is invoked to compute `VM.secureboot_certificates_state`. Since xapi process has not completed initialization at that point, this program cannot call any services of xapi.

### 3.6 Boot-time Automatic Update Path

When varstored initializes a VM and sees `secureboot_certificates_state=update_on_boot`, varstored does,

- Perform certificate update flow during boot-time initialization
- Write updated NVRAM and synchronize state via `VM.set_NVRAM_EFI_variables_V2`

The `VM.set_NVRAM_EFI_variables_V2` interface performs same as `VM.set_NVRAM_EFI_variables`, uses the existing varstored-guard process to make calls to xapi.

If `VM.set_NVRAM_EFI_variables_V2` runs into error (e.g. there is something wrong with the communication with xapi),

- xapi does not update VM NVRAM and `VM.secureboot_certificates_state`
- VM boot gets stuck at the firmware initialization stage, if the issue is not fixed, rebooting the VM will still encounter the same problem
- Once the issue is fixed, admin can continue the secureboot certificate upgrade by VM reboot

### 3.7 End-to-end Workflow

1. Upgrade packages (`xapi-core`, `varstored`, related components)
2. Restart toolstack
3. xapi DB upgrade initializes and recalculates `secureboot_certificates_state`
4. Admin marks selected VMs via `VM.update_secureboot_certificates_on_boot`
5. VM reboot triggers varstored certificate update
6. xapi updates state to reflect post-update NVRAM content

## 4. Out of Scope

- User-notification mechanism for certificate expiry
- Custom certificate workflow
- Template/snapshot feature expansion beyond state tracking and conversion behavior
- OS-specific test-process guidance
- VM with Secure Boot PCR7 binding (e.g. Windows bitlocker), provide customer documentation to guide how to resolve such issues
1 change: 1 addition & 0 deletions dune-project
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@

(package
(name tgroup)
(synopsis "Thread group management library")
(depends xapi-log xapi-stdext-unix))

(package
Expand Down
2 changes: 1 addition & 1 deletion ocaml/idl/datamodel_common.ml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ open Datamodel_roles
to leave a gap for potential hotfixes needing to increment the schema version.*)
let schema_major_vsn = 5

let schema_minor_vsn = 794
let schema_minor_vsn = 795

(* Historical schema versions just in case this is useful later *)
let rio_schema_major_vsn = 5
Expand Down
4 changes: 4 additions & 0 deletions ocaml/idl/datamodel_lifecycle.ml
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,8 @@ let prototyped_of_field = function
Some "25.15.0"
| "VM_guest_metrics", "netbios_name" ->
Some "24.28.0"
| "VM", "secureboot_certificates_state" ->
Some "26.1.12-next"
| "VM", "groups" ->
Some "24.19.1"
| "VM", "pending_guidances_full" ->
Expand Down Expand Up @@ -281,6 +283,8 @@ let prototyped_of_message = function
Some "24.0.0"
| "VM", "sysprep" ->
Some "25.24.0"
| "VM", "update_secureboot_certificates_on_boot" ->
Some "26.1.12-next"
| "VM", "get_secureboot_readiness" ->
Some "24.17.0"
| "VM", "set_uefi_mode" ->
Expand Down
80 changes: 79 additions & 1 deletion ocaml/idl/datamodel_vm.ml
Original file line number Diff line number Diff line change
Expand Up @@ -2352,7 +2352,45 @@ let set_HVM_boot_policy =
let set_NVRAM_EFI_variables =
call ~flags:[`Session] ~name:"set_NVRAM_EFI_variables"
~lifecycle:[(Published, rel_naples, "")]
~params:[(Ref _vm, "self", "The VM"); (String, "value", "The value")]
~versioned_params:
[
{
param_type= Ref _vm
; param_name= "self"
; param_doc= "The VM"
; param_release= naples_release
; param_default= None
}
; {
param_type= String
; param_name= "value"
; param_doc= "The EFI-variables value"
; param_release= naples_release
; param_default= None
}
; {
param_type=
Enum
( "update_status"
, [
("yes", "Set secureboot_certificates_state to ok")
; ("no", "Leave secureboot_certificates_state unchanged")
; ( "unspecified"
, "Check certificates and update \
secureboot_certificates_state accordingly"
)
]
)
; param_name= "update"
; param_doc=
"If 'yes', set secureboot_certificates_state to ok. If 'no', keep \
the current secureboot_certificates_state unchanged. If omitted \
(defaults to 'unspecified'), run certificate check to determine \
the state."
; param_release= numbered_release "26.1.12-next"
; param_default= Some (VEnum "unspecified")
}
]
~hide_from_docs:true ~allowed_roles:_R_LOCAL_ROOT_ONLY ()

let restart_device_models =
Expand Down Expand Up @@ -2507,6 +2545,40 @@ let set_uefi_mode =
~result:(String, "Result from the varstore-sb-state call")
~doc:"Set the UEFI mode of a VM" ~allowed_roles:_R_POOL_ADMIN ()

let vm_secureboot_certificates_state =
Enum
( "vm_secureboot_certificates_state"
, [
( "ok"
, "The VM's certificates do not need to be updated (including the case \
where Secure Boot does not apply to this VM, e.g. BIOS VM)."
)
; ( "update_available"
, "The Secure Boot certificates are due to expire or have already \
expired."
)
; ( "update_on_boot"
, "An update of the certificates will be triggered whenever the VM \
boots. This includes VM.start, VM.reboot and a guest-triggered \
reboot."
)
]
)

let update_secureboot_certificates_on_boot =
call ~name:"update_secureboot_certificates_on_boot" ~lifecycle:[]
~params:
[
(Ref _vm, "self", "The VM")
; ( Bool
, "mark"
, "If true: mark certificates for update on next boot. If false: \
remove the mark"
)
]
~doc:"Mark or unmark secure boot certificate update on VM boot"
~allowed_roles:_R_VM_ADMIN ()

let vm_secureboot_readiness =
Enum
( "vm_secureboot_readiness"
Expand Down Expand Up @@ -2681,6 +2753,7 @@ let t =
; restart_device_models
; set_uefi_mode
; get_secureboot_readiness
; update_secureboot_certificates_on_boot
; set_blocked_operations
; add_to_blocked_operations
; remove_from_blocked_operations
Expand Down Expand Up @@ -3291,6 +3364,11 @@ let t =
doesn't need to"
; field ~qualifier:DynamicRO ~lifecycle:[] ~ty:(Set (Ref _vm_group))
"groups" "VM groups associated with the VM"
; field ~qualifier:DynamicRO ~lifecycle:[]
~ty:vm_secureboot_certificates_state
~default_value:(Some (VEnum "ok")) "secureboot_certificates_state"
"The state of the Secure Boot certificates, showing whether an \
update is available, already scheduled, or not needed."
]
)
()
2 changes: 1 addition & 1 deletion ocaml/idl/schematest.ml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ let hash x = Digest.string x |> Digest.to_hex
(* BEWARE: if this changes, check that schema has been bumped accordingly in
ocaml/idl/datamodel_common.ml, usually schema_minor_vsn *)

let last_known_schema_hash = "51e48060a7bc8427039d56bca269db49"
let last_known_schema_hash = "87dce17b30693b57292d1168002b856f"

let current_schema_hash : string =
let open Datamodel_types in
Expand Down
Loading
Loading