-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
ISSUE TYPE
- Bug Report
COMPONENT NAME
Agent
CLOUDSTACK VERSION
4.19.1.1
CONFIGURATION
Advanced Networking
KVM
Consolidated Cluster:
- 2 KVM Hosts: Management Server, Agent, & Database
- 1 KVM Host: Agent
OS / ENVIRONMENT
Ubuntu 22.04
SUMMARY
When performing a host reboot after updating general packages (e.g., kernel updates), the agent fails to return to an "Up" state.
The error observed when attempting to disable maintnenance mode:

The agent logs:
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from command "lscpu".
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from lscpu.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.lang.NullPointerException
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Float.parseFloat(Float.java:455)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromCommandLscpu(KVMHostInfo.java:134)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:109)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency]
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.io.FileNotFoundException: /sys/devices/system/cpu/cpu0/cpufreq/base_frequency (No such file or directory)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.open0(Native Method)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.(FileInputStream.java:157)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileInputStream.(FileInputStream.java:112)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.io.FileReader.(FileReader.java:60)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromFile(KVMHostInfo.java:145)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:114)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from "host capabilities"
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Retrieved value [2100000000] from "host capabilities". This corresponds to a CPU speed of [2100] MHz.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Host uses control group [cgroup2fs].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Calculating the max shares of the host.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) The max shares of the host is [268800].
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: root : PWD=/ ; USER=root ; COMMAND=/usr/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session closed for user root
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script] (Agent-Handler-1:) (logid:) Execution of process [7107] for command [/bin/bash -c dpkg -l nbdkit ] failed.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script] (Agent-Handler-1:) (logid:) Process [7107] for command [/bin/bash -c dpkg -l nbdkit ] encountered the error: [1].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to create storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 (Filesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing defined storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9, using it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 from libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Process agent startup answer, agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Set agent id 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Startup Response Received: agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Attempting to create storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 (NetworkFilesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Found existing defined storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6, using it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac) Trying to fetch storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 from libvirt
The management server log:
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: com.cloud.utils.exception.CloudRuntimeException: Could not restart agent on Host {"id":17,"name":"dtw-cm-kvm01-03","type":"Routing","uuid":"2549c995-78ca-411f-91f8-7a4863fe3d15"} due to: Failed to restart cloudstack-agent.service: Interactive authentication required.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: See system logs and 'systemctl status cloudstack-agent.service' for details.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.connectAndRestartAgentOnHost(ResourceManagerImpl.java:2958)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.handleAgentIfNotConnected(ResourceManagerImpl.java:2923)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2893)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2977)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1294)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.lang.reflect.Method.invoke(Method.java:566)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.sun.proxy.$Proxy200.cancelMaintenance(Unknown Source)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:96)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:172)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:654)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:602)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-12:ctx-d6de7f78 job-720) (logid:cb386b6d) Remove job-720 from job monitoring
STEPS TO REPRODUCE
1. Place KVM host into maintenance mode
2. Perform package updates on the KVM host
3. Reboot the KVM host
EXPECTED RESULTS
The agent status changes to "Up" after the reboot has been completed and the agent is started.
ACTUAL RESULTS
The agent status is "Alert".