Skip to content

Bridge interface not recognized for newly added host if they don't have a physical interface attached #10727

@deajan

Description

@deajan

problem

So I added a KVM hypervisor running AlmaLinux 9.5 to a Cloudstack Management Server via the UI, which failed with error Unable to add the host: Cannot find the server resources at <host>

Image

While digging in the cloudstack management-server.log file, I noticed that my bridge br_npf0 is not found according to the management server:

2025-04-15 14:47:43,865 INFO  [c.c.a.m.ClusteredAgentManagerImpl] (AgentManager-Handler-7:[]) (logid:) PingMap for agent: 1 will not be updated because agent is no longer in the PingMap
2025-04-15 14:47:43,866 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentManager-Handler-7:[]) (logid:) Not processing PingRoutingCommand for agent id=0; can't find the host in the DB
2025-04-15 14:47:43,867 DEBUG [c.c.a.t.Request] (AgentManager-Handler-8:[]) (logid:) Seq 1-9217179587367141377: Processing:  { Ans: , MgmtId: 90520739542428, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.CheckNetworkAnswer":{"_reconnect":"false","result":"false","details":"Can not find network: br_npf0","wait":"0","bypassHostMaintenance":"false"}}] }
2025-04-15 14:47:43,867 DEBUG [c.c.a.m.ClusteredAgentAttache] (AgentManager-Handler-8:[]) (logid:) Seq 1-9217179587367141377: No more commands found
2025-04-15 14:47:43,867 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-596:[ctx-58c65884]) (logid:c8f5bf3e) Seq 1-9217179587367141377: Received:  { Ans: , MgmtId: 90520739542428, via: 1(reacted_host_name.local), Ver: v1, Flags: 110, { CheckNetworkAnswer } }
2025-04-15 14:47:43,867 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-596:[ctx-58c65884]) (logid:c8f5bf3e) Details from executing class com.cloud.agent.api.CheckNetworkCommand: Can not find network: br_npf0
2025-04-15 14:47:43,867 WARN  [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-596:[ctx-58c65884]) (logid:c8f5bf3e) Unable to setup agent 1 due to Can not find network: br_npf0
2025-04-15 14:47:43,870 WARN  [c.c.a.AlertManagerImpl] (AgentConnectTaskPool-596:[ctx-58c65884]) (logid:c8f5bf3e) alertType=[7] dataCenterId=[1] podId=[1] clusterId=[null] message=[Incorrect Network setup on agent, Reinitialize agent after network names are setup, details : Can not find network: br_npf0].
2025-04-15 14:47:43,878 WARN  [c.c.a.AlertManagerImpl] (AgentConnectTaskPool-596:[ctx-58c65884]) (logid:c8f5bf3e) No recipients set in global setting 'alert.email.addresses', skipping sending alert with subject [Incorrect Network setup on agent, Reinitialize agent after network names are setup, details : Can not find network: br_npf0] and content [Incorrect Network setup on agent, Reinitialize agent after network names are setup, details : Can not find network: br_npf0].
2025-04-15 14:47:43,878 INFO  [c.c.u.e.CSExceptionErrorCode] (AgentConnectTaskPool-596:[ctx-58c65884]) (logid:c8f5bf3e) Could not find exception: com.cloud.exception.ConnectionException in error code list for exceptions
[...]
2025-04-15 14:47:43,890 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-596:[ctx-58c65884]) (logid:c8f5bf3e) Failed to handle host connection: com.cloud.exception.ConnectionException: Incorrect Network setup on agent, Reinitialize agent after network names are setup, details : Can not find network: br_npf0
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.processConnect(NetworkOrchestrator.java:4321)
        at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:553)
        at com.cloud.agent.manager.AgentManagerImpl.sendReadyAndGetAttache(AgentManagerImpl.java:1116)
        at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:1135)
        at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1227)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)

Looking at my configuration on the KVM host, the bridge br_npf0 exists, is up, has an IP, and can ping the management server.

ip a | grep br_npf0
10: br_npf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 10.13.37.2/24 brd 10.13.37.255 scope global noprefixroute br_npf0
28: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br_npf0 state UP group default qlen 1000

# pinging the cloudstack management server
ping cloudstack01i.npf.local -c 1
PING cloudstack01i.npf.local (10.13.37.250) 56(84) bytes of data.
64 bytes from cloudstack01i.npf.local (10.13.37.250): icmp_seq=1 ttl=64 time=0.687 ms

--- cloudstack01i.npf.local ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.687/0.687/0.687/0.000 ms

The management server zone is setup with that exact same bridge name:

Image

Is there any direction to point me to ?

versions

Cloudstack 4.20 running on AlmaLinux 9.5
KVM host AlmaLinux 9.5 with bridge setup via NetworkManager

The steps to reproduce the bug

What to do about it?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions