Skip to content

Commit f5f3db5

Browse files
committed
kvm-heartbeat: extract shared fence helper; add 'custom' action; rename to hard-reboot
Per review on PR #13090: - Refactor common fence-action case into kvmha-fence.sh sourced by both kvmheartbeat.sh and kvmspheartbeat.sh (per @sureshanaparti). - Add 'custom' fence action that invokes an operator-supplied script (kvm.heartbeat.fence.custom.script, default /etc/cloudstack/agent/ heartbeat-fence-custom.sh) with the heartbeat script name as arg; falls back to hard-reboot if the script is missing/non-executable (per @NuxRo). - Rename canonical action 'reboot' -> 'hard-reboot' for clarity; keep 'reboot' accepted as alias so existing deployments don't break (per @DaanHoogland). Default behavior unchanged: sysrq-trigger reboot, required where a stale NFSv3 mount blocks systemctl reboot.
1 parent d603b26 commit f5f3db5

5 files changed

Lines changed: 134 additions & 89 deletions

File tree

agent/conf/agent.properties

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -315,16 +315,25 @@ iscsi.session.cleanup.enabled=false
315315
# 'reboot.host.and.alert.management.on.heartbeat.timeout' when set to a non-default value.
316316
#
317317
# Allowed values:
318-
# reboot - immediate sysrq-trigger reboot (default; original behavior)
319-
# graceful-reboot - 'systemctl reboot' instead of sysrq; allows VMs to stop cleanly
320-
# restart-agent - restart cloudstack-agent only; running VMs are preserved
321-
# log-only - log + alert; take no automatic action (admin must investigate)
318+
# hard-reboot - immediate sysrq-trigger reboot (default; 'reboot' kept as alias).
319+
# Required default for setups where a stale NFSv3 mount can prevent
320+
# a graceful shutdown from completing.
321+
# graceful-reboot - 'systemctl reboot' instead of sysrq; allows VMs to stop cleanly.
322+
# Use only if a stale storage mount cannot block shutdown.
323+
# restart-agent - restart cloudstack-agent only; running VMs are preserved.
324+
# log-only - log + alert; take no automatic action (admin must investigate).
325+
# custom - invoke the script at 'kvm.heartbeat.fence.custom.script' (see below).
326+
# Script is called with one positional arg: the heartbeat script name
327+
# (e.g. 'kvmheartbeat.sh'). Falls back to hard-reboot if missing or
328+
# not executable.
322329
#
323-
# The 'graceful-reboot', 'restart-agent', and 'log-only' actions are recommended
324-
# for setups using LINSTOR/DRBD or any local storage with replication, where
325-
# transient I/O contention can cause a heartbeat write to time out without the
326-
# host actually being unhealthy.
327-
#kvm.heartbeat.fence.action=reboot
330+
# The non-default values are recommended for setups using LINSTOR/DRBD or any local
331+
# storage with replication, where transient I/O contention can cause a heartbeat
332+
# write to time out without the host actually being unhealthy.
333+
#kvm.heartbeat.fence.action=hard-reboot
334+
335+
# Path to the operator-supplied script invoked when kvm.heartbeat.fence.action=custom.
336+
#kvm.heartbeat.fence.custom.script=/etc/cloudstack/agent/heartbeat-fence-custom.sh
328337

329338
# Enables manually setting CPU's topology on KVM's VM.
330339
#enable.manually.setting.cpu.topology.on.kvm.vm=true

agent/src/main/java/com/cloud/agent/properties/AgentProperties.java

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -602,20 +602,39 @@ public class AgentProperties{
602602
* Action taken by the KVM agent's storage heartbeat scripts (kvmheartbeat.sh / kvmspheartbeat.sh)
603603
* when a heartbeat write fails persistently. Allowed values:
604604
* <ul>
605-
* <li>{@code reboot} (default) — immediate sysrq-trigger reboot; original behavior</li>
606-
* <li>{@code graceful-reboot} — {@code systemctl reboot} instead of sysrq, lets VMs stop cleanly</li>
607-
* <li>{@code restart-agent} — restart cloudstack-agent only; running VMs preserved</li>
608-
* <li>{@code log-only} — log + alert, no automatic action</li>
605+
* <li>{@code hard-reboot} (default; {@code reboot} accepted as alias) — immediate
606+
* sysrq-trigger reboot. Required default for setups where a stale NFSv3 mount can
607+
* prevent a graceful shutdown from completing.</li>
608+
* <li>{@code graceful-reboot} — {@code systemctl reboot} instead of sysrq; allows VMs
609+
* to stop cleanly. Use only if a stale storage mount cannot block shutdown.</li>
610+
* <li>{@code restart-agent} — restart cloudstack-agent only; running VMs preserved.</li>
611+
* <li>{@code log-only} — log + alert; take no automatic action (admin must investigate).</li>
612+
* <li>{@code custom} — invoke the script at {@link #KVM_HEARTBEAT_FENCE_CUSTOM_SCRIPT}
613+
* (default {@code /etc/cloudstack/agent/heartbeat-fence-custom.sh}). The script is
614+
* called with one argument: the heartbeat script name (e.g. {@code kvmheartbeat.sh}).
615+
* If the script is missing or not executable, falls back to {@code hard-reboot}.</li>
609616
* </ul>
610617
* The non-default values are recommended for setups using LINSTOR/DRBD or other replicated
611618
* local storage, where transient I/O contention can cause a heartbeat write to time out
612619
* without the host actually being unhealthy.<br>
613620
* Read by the heartbeat shell scripts directly from agent.properties.<br>
614621
* Data type: String.<br>
615-
* Default value: {@code reboot}
622+
* Default value: {@code hard-reboot}
616623
*/
617624
public static final Property<String> KVM_HEARTBEAT_FENCE_ACTION
618-
= new Property<>("kvm.heartbeat.fence.action", "reboot");
625+
= new Property<>("kvm.heartbeat.fence.action", "hard-reboot");
626+
627+
/**
628+
* Path to the operator-supplied script invoked when
629+
* {@link #KVM_HEARTBEAT_FENCE_ACTION} is set to {@code custom}. The script must be
630+
* executable and is called with a single positional argument: the heartbeat script name
631+
* that triggered the fence (e.g. {@code kvmheartbeat.sh}). Read by the heartbeat shell
632+
* scripts directly from agent.properties.<br>
633+
* Data type: String.<br>
634+
* Default value: {@code /etc/cloudstack/agent/heartbeat-fence-custom.sh}
635+
*/
636+
public static final Property<String> KVM_HEARTBEAT_FENCE_CUSTOM_SCRIPT
637+
= new Property<>("kvm.heartbeat.fence.custom.script", "/etc/cloudstack/agent/heartbeat-fence-custom.sh");
619638

620639
/**
621640
* Enables manually setting CPU's topology on KVM's VM. <br>
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
#!/bin/bash
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
#
19+
# Shared fence-action helper for kvmheartbeat.sh and kvmspheartbeat.sh.
20+
# Sourced by both scripts; do not invoke directly.
21+
#
22+
# Usage from caller:
23+
# source "$(dirname "$0")/kvmha-fence.sh"
24+
# fence_action "kvmheartbeat.sh" # script name passed for log tagging
25+
26+
AGENT_PROPS="${AGENT_PROPS:-/etc/cloudstack/agent/agent.properties}"
27+
28+
fence_action() {
29+
local source_script="${1:-kvmha}"
30+
local FENCE_ACTION="hard-reboot"
31+
local CUSTOM_SCRIPT="/etc/cloudstack/agent/heartbeat-fence-custom.sh"
32+
33+
if [ -r "$AGENT_PROPS" ]; then
34+
local val
35+
val=$(grep -E '^[[:space:]]*kvm\.heartbeat\.fence\.action[[:space:]]*=' "$AGENT_PROPS" | tail -n 1 | cut -d= -f2- | tr -d '[:space:]')
36+
[ -n "$val" ] && FENCE_ACTION="$val"
37+
local cval
38+
cval=$(grep -E '^[[:space:]]*kvm\.heartbeat\.fence\.custom\.script[[:space:]]*=' "$AGENT_PROPS" | tail -n 1 | cut -d= -f2- | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
39+
[ -n "$cval" ] && CUSTOM_SCRIPT="$cval"
40+
fi
41+
42+
case "$FENCE_ACTION" in
43+
log-only)
44+
/usr/bin/logger -t heartbeat "${source_script}: heartbeat write to storage failed; fence action 'log-only' selected — taking no automatic action. Operator must investigate."
45+
exit 0
46+
;;
47+
restart-agent)
48+
/usr/bin/logger -t heartbeat "${source_script}: heartbeat write to storage failed; fence action 'restart-agent' — restarting cloudstack-agent (running VMs preserved)."
49+
sync &
50+
sleep 2
51+
systemctl restart cloudstack-agent
52+
exit $?
53+
;;
54+
graceful-reboot)
55+
/usr/bin/logger -t heartbeat "${source_script}: heartbeat write to storage failed; fence action 'graceful-reboot' — rebooting via systemctl (allows running VMs to stop cleanly)."
56+
sync &
57+
sleep 5
58+
systemctl reboot
59+
exit $?
60+
;;
61+
custom)
62+
if [ -x "$CUSTOM_SCRIPT" ]; then
63+
/usr/bin/logger -t heartbeat "${source_script}: heartbeat write to storage failed; fence action 'custom' — running ${CUSTOM_SCRIPT}."
64+
sync &
65+
sleep 2
66+
"$CUSTOM_SCRIPT" "$source_script"
67+
exit $?
68+
else
69+
/usr/bin/logger -t heartbeat "${source_script}: heartbeat write to storage failed; fence action 'custom' selected but ${CUSTOM_SCRIPT} is missing or not executable — falling back to hard-reboot."
70+
sync &
71+
sleep 5
72+
echo b > /proc/sysrq-trigger
73+
exit $?
74+
fi
75+
;;
76+
hard-reboot|reboot|*)
77+
# 'reboot' kept as alias for back-compat with pre-existing deployments.
78+
/usr/bin/logger -t heartbeat "${source_script} will reboot system because it was unable to write the heartbeat to the storage."
79+
sync &
80+
sleep 5
81+
echo b > /proc/sysrq-trigger
82+
exit $?
83+
;;
84+
esac
85+
}

scripts/vm/hypervisor/kvm/kvmheartbeat.sh

Lines changed: 3 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -156,43 +156,9 @@ then
156156
exit 0
157157
elif [ "$cflag" == "1" ]
158158
then
159-
# Read fence action from agent.properties (default: reboot for backward compatibility).
160-
# Allowed values: reboot | graceful-reboot | restart-agent | log-only
161-
AGENT_PROPS="/etc/cloudstack/agent/agent.properties"
162-
FENCE_ACTION="reboot"
163-
if [ -r "$AGENT_PROPS" ]; then
164-
val=$(grep -E '^[[:space:]]*kvm\.heartbeat\.fence\.action[[:space:]]*=' "$AGENT_PROPS" | tail -n 1 | cut -d= -f2- | tr -d '[:space:]')
165-
[ -n "$val" ] && FENCE_ACTION="$val"
166-
fi
167-
168-
case "$FENCE_ACTION" in
169-
log-only)
170-
/usr/bin/logger -t heartbeat "kvmheartbeat.sh: heartbeat write to storage failed; fence action 'log-only' selected — taking no automatic action. Operator must investigate."
171-
exit 0
172-
;;
173-
restart-agent)
174-
/usr/bin/logger -t heartbeat "kvmheartbeat.sh: heartbeat write to storage failed; fence action 'restart-agent' — restarting cloudstack-agent (running VMs preserved)."
175-
sync &
176-
sleep 2
177-
systemctl restart cloudstack-agent
178-
exit $?
179-
;;
180-
graceful-reboot)
181-
/usr/bin/logger -t heartbeat "kvmheartbeat.sh: heartbeat write to storage failed; fence action 'graceful-reboot' — rebooting via systemctl (allows running VMs to stop cleanly)."
182-
sync &
183-
sleep 5
184-
systemctl reboot
185-
exit $?
186-
;;
187-
reboot|*)
188-
# Original behavior: immediate kernel-level reboot via sysrq-trigger
189-
/usr/bin/logger -t heartbeat "kvmheartbeat.sh will reboot system because it was unable to write the heartbeat to the storage."
190-
sync &
191-
sleep 5
192-
echo b > /proc/sysrq-trigger
193-
exit $?
194-
;;
195-
esac
159+
# shellcheck disable=SC1091
160+
. "$(dirname "$0")/kvmha-fence.sh"
161+
fence_action "kvmheartbeat.sh"
196162
else
197163
write_hbLog
198164
exit $?

scripts/vm/hypervisor/kvm/kvmspheartbeat.sh

Lines changed: 3 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -58,41 +58,7 @@ deleteVMs() {
5858

5959
if [ "$cflag" == "1" ]
6060
then
61-
# Read fence action from agent.properties (default: reboot for backward compatibility).
62-
# Allowed values: reboot | graceful-reboot | restart-agent | log-only
63-
AGENT_PROPS="/etc/cloudstack/agent/agent.properties"
64-
FENCE_ACTION="reboot"
65-
if [ -r "$AGENT_PROPS" ]; then
66-
val=$(grep -E '^[[:space:]]*kvm\.heartbeat\.fence\.action[[:space:]]*=' "$AGENT_PROPS" | tail -n 1 | cut -d= -f2- | tr -d '[:space:]')
67-
[ -n "$val" ] && FENCE_ACTION="$val"
68-
fi
69-
70-
case "$FENCE_ACTION" in
71-
log-only)
72-
/usr/bin/logger -t heartbeat "kvmspheartbeat.sh: heartbeat write to storage failed; fence action 'log-only' selected — taking no automatic action. Operator must investigate."
73-
exit 0
74-
;;
75-
restart-agent)
76-
/usr/bin/logger -t heartbeat "kvmspheartbeat.sh: heartbeat write to storage failed; fence action 'restart-agent' — restarting cloudstack-agent (running VMs preserved)."
77-
sync &
78-
sleep 2
79-
systemctl restart cloudstack-agent
80-
exit $?
81-
;;
82-
graceful-reboot)
83-
/usr/bin/logger -t heartbeat "kvmspheartbeat.sh: heartbeat write to storage failed; fence action 'graceful-reboot' — rebooting via systemctl (allows running VMs to stop cleanly)."
84-
sync &
85-
sleep 5
86-
systemctl reboot
87-
exit $?
88-
;;
89-
reboot|*)
90-
# Original behavior: immediate kernel-level reboot via sysrq-trigger
91-
/usr/bin/logger -t heartbeat "kvmspheartbeat.sh will reboot system because it was unable to write the heartbeat to the storage."
92-
sync &
93-
sleep 5
94-
echo b > /proc/sysrq-trigger
95-
exit $?
96-
;;
97-
esac
61+
# shellcheck disable=SC1091
62+
. "$(dirname "$0")/kvmha-fence.sh"
63+
fence_action "kvmspheartbeat.sh"
9864
fi

0 commit comments

Comments
 (0)