Skip to content

network: ExternalGuestNetworkGuru should defer to specialized gurus#133

Open
msinhore wants to merge 1 commit intoshapeblue:mainfrom
msinhore:fix-external-guru-strict-canhandle
Open

network: ExternalGuestNetworkGuru should defer to specialized gurus#133
msinhore wants to merge 1 commit intoshapeblue:mainfrom
msinhore:fix-external-guru-strict-canhandle

Conversation

@msinhore
Copy link
Copy Markdown

ExternalGuestNetworkGuru.canHandle currently accepts any Advanced / Guest / Isolated|L2 offering whose physical network advertises GRE / L3 / VLAN as one of its isolation methods, regardless of the provider list on the offering. When the same physical network also advertises a SDN-specific isolation method (OVN, NSX, NETRIS, ...), operators routinely keep VLAN registered alongside it for legacy offerings — and that single fact is enough to make this guru claim modern, specialized offerings too.

NetworkOrchestrator.setupNetwork iterates every NetworkGuru and persists one NetworkVO row per matching guru. With the current canHandle, the specialized guru (e.g. OvnGuestNetworkGuru) creates the real row that gets implemented, while ExternalGuestNetworkGuru silently creates a second, parallel row that stays in Allocated state forever: no broadcast_uri, no implement() call, but visible in listNetworks and the UI as a phantom tier / network.

This breaks user-visible state in two ways:

  • listNetworks returns duplicate entries for every isolated network or VPC tier on a zone whose physical network keeps VLAN registered alongside the specialized isolation method.
  • Cleanup paths can leave the orphaned row behind even after the real network is destroyed, because it was never linked to any backend state.

The fix is to make ExternalGuestNetworkGuru bail out of canHandle when the offering binds any of its services to a provider that ships its own dedicated NetworkGuru: Ovn, Netris, Nsx, Tungsten, Ovs. Those providers have their own canHandle that already gates on their isolation method and their provider in networkOfferingServiceMapDao — so the specialized guru is always the canonical claimant for those offerings.

The list is intentionally explicit and conservative: it covers every in-tree guest NetworkGuru that lives outside ExternalGuestNetworkGuru and that targets the same Isolated / L2 offerings on the Advanced zone. SspGuestNetworkGuru and VxlanGuestNetworkGuru are not in the list because they use isolation methods (SSP, VXLAN) that ExternalGuestNetworkGuru never matched in the first place, so there was no overlap to fix there.

Lab-validated end to end on an OVN-backed zone whose physical network had OVN+VLAN registered: every newly created VPC tier and isolated network now produces exactly one networks row instead of two.

ExternalGuestNetworkGuru.canHandle currently accepts any Advanced /
Guest / Isolated|L2 offering whose physical network advertises
GRE / L3 / VLAN as one of its isolation methods, regardless of the
provider list on the offering. When the same physical network also
advertises a SDN-specific isolation method (OVN, NSX, NETRIS, ...),
operators routinely keep VLAN registered alongside it for legacy
offerings — and that single fact is enough to make this guru claim
modern, specialized offerings too.

NetworkOrchestrator.setupNetwork iterates every NetworkGuru and
persists one NetworkVO row per matching guru. With the current
canHandle, the specialized guru (e.g. OvnGuestNetworkGuru) creates
the real row that gets implemented, while ExternalGuestNetworkGuru
silently creates a second, parallel row that stays in Allocated state
forever: no broadcast_uri, no implement() call, but visible in
listNetworks and the UI as a phantom tier / network.

This breaks user-visible state in two ways:
- listNetworks returns duplicate entries for every isolated network or
  VPC tier on a zone whose physical network keeps VLAN registered
  alongside the specialized isolation method.
- Cleanup paths can leave the orphaned row behind even after the real
  network is destroyed, because it was never linked to any backend
  state.

The fix is to make ExternalGuestNetworkGuru bail out of canHandle when
the offering binds any of its services to a provider that ships its
own dedicated NetworkGuru: Ovn, Netris, Nsx, Tungsten, BigSwitchBcf,
NiciraNvp, Opendaylight, BrocadeVcs, Ovs. Those providers have their
own canHandle that already gates on their isolation method *and*
their provider in networkOfferingServiceMapDao — so the specialized
guru is always the canonical claimant for those offerings.

The list is intentionally explicit and conservative: it covers every
in-tree guest NetworkGuru that lives outside ExternalGuestNetworkGuru
and that targets the same Isolated / L2 offerings on the Advanced
zone. SspGuestNetworkGuru and VxlanGuestNetworkGuru are not in the
list because they use isolation methods (SSP, VXLAN) that
ExternalGuestNetworkGuru never matched in the first place, so there
was no overlap to fix there.

Lab-validated end to end on an OVN-backed zone whose physical network
had OVN+VLAN registered: every newly created VPC tier and isolated
network now produces exactly one networks row instead of two.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant