[linux-cp/dpdk] Fix sub-port and LAG sub-port datapath on sonic-platform-vpp#237
[linux-cp/dpdk] Fix sub-port and LAG sub-port datapath on sonic-platform-vpp#237lunyue-ms wants to merge 7 commits into
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
b4d1855 to
30f3586
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
I am not sure about this. vpp has api to do that. in vlan-bvi HLD (#228), it has this.
4.7 Promiscuous Mode on Every Physical Interface
File: SwitchVppHostif.cpp
At LCP pair creation for each physical port:
configure_lcp_interface(hwif_name, dev, true);
interface_set_promiscuous(hwif_name, true); // <-- addedThere was a problem hiding this comment.
Good idea. interface_set_promiscuous() is listed in HLD #228 but isn't in master yet — the underlying VPP API sw_interface_set_promisc exists, so it's a thin shim.
I'll add it into sairedis PR #1907: add the wrapper, call it from SwitchVppHostif::createHostif after configure_lcp_interface(), and remove vpp_promisc.sh from this PR. Do you agree?
|
|
||
| # Why not support sub interfaces | ||
| linux-cp { | ||
| lcp-auto-subint |
There was a problem hiding this comment.
If we disable it, vpp sai needs to create sub interface for regular Ethernet interface. Is this in the plan?
There was a problem hiding this comment.
In sonic-net/sonic-sairedis#1907, In SwitchVppRif.cpp::vpp_create_router_interface():
int ret = create_sub_interface(parent_hwif, vlan_id, vlan_id);
if (ret != 0) { return SAI_STATUS_FAILURE; }
ret = configure_lcp_interface(hw_subifname, lcp_host, true);
if (ret != 0) {
delete_sub_interface(parent_hwif, vlan_id); // rollback
return SAI_STATUS_FAILURE;
}
For a regular Ethernet sub-port (e.g. Ethernet0.10) this will be:
create_sub_interface("bobm0", 10, 10)
configure_lcp_interface("bobm0.10", "Ethernet0.10", true)
So I think that it covers to create sub interface for regular Ethernet interface.
| # Why not support sub interfaces | ||
| linux-cp { | ||
| lcp-auto-subint | ||
| # lcp-auto-subint disabled: auto-pair "BondEthernet<id>.<vlan>" <-> "be<id>.<vlan>" |
There was a problem hiding this comment.
This is a big change. Please make sure you run full sonic-mgmt tests to avoid breakage
There was a problem hiding this comment.
Yes, it is. I wil run a roll of ful regression tests to make sure it doesn't break any other tests.
lunyue-ms
left a comment
There was a problem hiding this comment.
Thanks @yue-fred-gao for your comments. Please check if my thoughts is feasible.
|
|
||
| # Why not support sub interfaces | ||
| linux-cp { | ||
| lcp-auto-subint |
There was a problem hiding this comment.
In sonic-net/sonic-sairedis#1907, In SwitchVppRif.cpp::vpp_create_router_interface():
int ret = create_sub_interface(parent_hwif, vlan_id, vlan_id);
if (ret != 0) { return SAI_STATUS_FAILURE; }
ret = configure_lcp_interface(hw_subifname, lcp_host, true);
if (ret != 0) {
delete_sub_interface(parent_hwif, vlan_id); // rollback
return SAI_STATUS_FAILURE;
}
For a regular Ethernet sub-port (e.g. Ethernet0.10) this will be:
create_sub_interface("bobm0", 10, 10)
configure_lcp_interface("bobm0.10", "Ethernet0.10", true)
So I think that it covers to create sub interface for regular Ethernet interface.
| # Why not support sub interfaces | ||
| linux-cp { | ||
| lcp-auto-subint | ||
| # lcp-auto-subint disabled: auto-pair "BondEthernet<id>.<vlan>" <-> "be<id>.<vlan>" |
There was a problem hiding this comment.
Yes, it is. I wil run a roll of ful regression tests to make sure it doesn't break any other tests.
There was a problem hiding this comment.
Good idea. interface_set_promiscuous() is listed in HLD #228 but isn't in master yet — the underlying VPP API sw_interface_set_promisc exists, so it's a thin shim.
I'll add it into sairedis PR #1907: add the wrapper, call it from SwitchVppHostif::createHostif after configure_lcp_interface(), and remove vpp_promisc.sh from this PR. Do you agree?
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Sub-port host kernel netdevs (e.g. Ethernet64.20, PortChannel1.20) were created in DOWN state because VPP linux-cp plugin sets host-side link state to DOWN by default when the lcp itf-pair is created, overriding the prior 'ip link set <subif> up' that IntfMgr executed. As a result, host kernel ignored incoming sub-port traffic and PTF-driven sub-port tests (e.g. test_packet_routed_with_valid_vlan) failed because the host ICMP stack never replied to echo-request. Enable 'lcp-sync' in the linux-cp startup configuration so that VPP propagates sub-interface admin state changes to the corresponding host netdev. When sairedis brings the VPP sub-interface up via SAI API, lcp-sync now auto-ups the host counterpart, restoring the standard sub-port datapath. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
VPP DPDK virtio ports default to 'unicast off all-multicast on', which
drops broadcast frames such as ARP requests targeting newly-created L3
sub-port interfaces. Real ASICs are implicitly promiscuous for routed
ports, so PTF tests that rely on post-boot sub-port creation fail at
the very first ARP from the peer.
Manual reproduction on vlab-vpp-01:
- Create Ethernet64.40 with IP 172.16.0.9/30
- From PTF eth16.40 (172.16.0.10) ping 172.16.0.9
- 0 packets arrive at bobm16 (verified via vppctl trace)
- After 'vppctl set interface promiscuous on bobm16', ping succeeds
VPP's startup-config directive cannot fix this because CLI commands run
*before* the DPDK plugin finishes initialising its ports, so
'set interface promiscuous on bobm0' silently fails at that point.
A previous attempt forked a background subshell from vpp_init.sh to
wait for VPP CLI and re-apply the CLI commands. That subshell did not
survive supervisord's process-group handling, so promisc was never
actually enabled at runtime. Worse, the syncd container is restarted
by sonic-mgmt PTF fixtures during test setup ('config reload') -- any
one-shot solution loses its effect after that restart.
This change replaces the fragile subshell with a dedicated supervisord
program 'vpp_promisc.sh' that:
1. Waits up to 180s for the VPP CLI to come up and at least one
bobm[0-9]+ interface to appear.
2. Iterates every discovered bobm* DPDK port and enables promisc.
3. Stays alive in a 2s monitor loop so promisc is re-applied after
in-container VPP restarts.
The new program depends on vpp_init.sh:running via
supervisord_dependent_startup, mirroring the existing pattern used by
syncd / rsyslogd / start. Supervisord auto-restarts the script if it
exits non-zero (e.g. timeout waiting for bobm0).
Applied to both docker-syncd-vpp and docker-sonic-vpp.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
…ision When VPP linux-cp lcp-auto-subint is on, VPP auto-generates host tap "be<id>.<vlan>" for BondEthernet<id>.<vlan>, but SONiC IntfMgr creates "PortChannel<id>.<vlan>" (teamd vlan child) as the actual host netdev that holds the L3 IP. The two netdevs are different, so ARP replies land on the auto-tap, not on PortChannel<X>.<vlan>, causing addNeighbor 0/8 success rate for sub-port-over-LAG. Disable lcp-auto-subint so sairedis is the sole producer of sub-port LCP pairs and can explicitly call configure_lcp_interface(BondEthernet<id>.<vlan>, PortChannel<id>.<vlan>) in SwitchVpp::vpp_create_router_interface for SAI_ROUTER_INTERFACE_TYPE_SUB_PORT. Verified on vlab-vpp-01 PTF test_packet_routed_with_valid_vlan: - LCP pair table now shows itf-pair: [46] BondEthernet1.20 tap4137.20 PortChannel1.20 - No more 'configure_lcp_interface ... returned -81 EEXIST' - [port] sub-test still PASS (no regression on non-LAG sub-port) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
Two related hardening fixes to vpp_promisc.sh / vpp_promisc.conf
applied symmetrically to docker-syncd-vpp and docker-sonic-vpp:
1. Restart storm on missing DPDK ports (phase-1 timeout).
The script previously exited 1 after 180s of polling if no
bobm* interface ever appeared. Combined with autorestart=true
and startsecs=5, supervisord respawned the script forever:
180s wait + 'timed out' log + restart, every cycle. Two
changes:
- phase-1 timeout now exits 0 with a single explanatory
log line ('no DPDK ports to manage')
- autorestart=true -> autorestart=unexpected, so a clean
exit leaves the program in EXITED state and only real
crashes (segfault, bash bug) get restarted.
2. Fragile promisc state check.
'grep -q "promiscuous: unicast on"' matches an exact
substring. Minor format changes between VPP releases (extra
space, key reordering) would silently break the 'already on'
check and cause vppctl to re-issue 'set promiscuous on' every
2s. Use a whitespace-tolerant regex
'promiscuous:[[:space:]]+unicast[[:space:]]+on' instead.
No functional change for the happy path (DPDK present, VPP
recent): script still polls every 2s and enables promiscuous on
new bobm* ports as before.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
Document the LAG sub-port (PortChannel<id>.<vlan>) plumbing introduced in sonic-sairedis PR #1907 and the matching platform-vpp changes (startup.conf 'lcp-auto-subint = disabled', vpp_promisc.sh monitor loop). vpp-lag.md: - TOC, Revisions (v0.3) and Scope: add 'LAG Subinterface' entries. - New section 'LAG Subinterface' explaining the 3-tuple naming (PortChannel<id>.<vlan> / BondEthernet<id>.<vlan> / be<id>.<vlan>), the bidirectional 'tc mirred' bridge between LCP host and SONiC L3 netdev, why 'lcp-auto-subint=disabled' is required, and the RIF IP / ARP handling (idempotent -105, arp_accept=1, neighbor pre-warm). - Status row updated to reflect 'Hwif naming + explicit LCP pair + tc bridge + RIF IP idempotency + arp_accept/pre-warm + basic [port_in_lag] PTF PASS', with a follow-up note for DUT-originated ARP on a few PTF cases (root cause to be diagnosed). SONICVPP-HLD.md: - Subinterface row: align with current behaviour (explicit configure_lcp_interface from sairedis with 'lcp-auto-subint=disabled'). - New LAG Subinterface row covering BondEthernet<id>.<vlan> + be<id>.<vlan> LCP pair and the bidirectional tc bridge to PortChannel<id>.<vlan>. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
Both docker-syncd-vpp and docker-sonic-vpp now disable lcp-auto-subint and carry the same rationale comment, so sairedis is the sole producer of LAG sub-port LCP pairs in either deployment model (multi-container VM image or single-container sonic-vpp image). Also rewrites the existing rationale comment in docker-syncd-vpp to be more accurate: configure_lcp_interface pairs BondEthernet<id>.<vlan> with be<id>.<vlan> (not with PortChannel<id>.<vlan>); the bridge to PortChannel<id>.<vlan> is a separate tc mirred step in the same SAI RIF create. ARP replies do reach the correct LCP host tap; the kernel drops them because the IP lives on a different netdev. Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Promiscuous mode on the VPP DPDK physical interfaces is now configured directly through the sairedis VPP API (interface_set_promiscuous, called at LCP pair creation and after bond member enslave) instead of the supervisord-managed vpp_promisc.sh polling script. Remove the now-redundant vpp_promisc.sh / vpp_promisc.conf and their COPY lines from both docker-sonic-vpp and docker-syncd-vpp. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
1e0deb0 to
ce80ca0
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Description of PR
Summary:
Three platform-level fixes that, together with the companion
sonic-sairedisPR, restore the sub-port and LAG-sub-port datapathon
sonic-platform-vpp. After this PR, PTFtest_sub_port_interfaces.py::TestSubPorts::test_packet_routed_with_valid_vlanpasses on
vlab-vpp-01for both[port]and[port_in_lag]parametrisations.
Fixes: sonic-net/sonic-platform-vpp#227
Companion PR:
the PASS result)
Type of change
Approach
What is the motivation for this PR?
On
sonic-platform-vpp, three independent issues keep the sub-portdatapath broken: VPP linux-cp holds the sub-port host netdev DOWN after
each itf-pair (re)create, DPDK virtio ports come up non-promiscuous and
drop broadcast ARP on freshly-created L3 sub-ports, and
lcp-auto-subintraces sairedis to install the LAG sub-port LCP pair(producing the wrong host tap). Together with the companion sairedis
changes, fixing these three restores sub-port and LAG sub-port
datapath end-to-end.
How did you do it?
Changes are confined to
docker-syncd-vpp/docker-sonic-vppconfigand
docs/HLD/:lcp-syncenabled — VPP propagates sub-interface admin state tothe host tap; without it linux-cp keeps the host netdev DOWN after
itf-pair (re)create, even if IntfMgr already ran
ip link set up.vpp_promisc.shpolls for DPDK port appearance and runsvppctl set interface promiscuous on bobm*. Real ASICs areimplicitly promiscuous on routed ports; VPP's
startup-configdirective cannot fix this because it runs before the DPDK plugin
initialises ports.
lcp-auto-subintdisabled — sairedis is now the sole producerof sub-port LCP pairs and atomically wires up
configure_lcp_interface(BondEthernet<id>.<vlan>, be<id>.<vlan>)plus the bidirectional
tc mirredbridge betweenbe<id>.<vlan>and the SONiC L3 netdev
PortChannel<id>.<vlan>in the same SAIRIF create.
autorestart=unexpected(no respawn storm on virtio-only platforms),and a whitespace-tolerant regex avoids re-issuing
set promiscuous onevery 2s on minor VPP output format changes.vpp-lag.mdgains a new "LAG Subinterface" sectiondescribing the 3-tuple naming (
PortChannel<id>.<vlan>/BondEthernet<id>.<vlan>/be<id>.<vlan>) and the bidirectionaltc mirredbridge;SONICVPP-HLD.mdadds a LAG Subinterface rowto the "Interface Create Flow by Type" table.
How did you verify/test it?
Result: 22 passed / 10 failed in ~32 min.
Failure breakdown:
test_routing_between_sub_ports_and_port[*-svi]— blocked by VLAN BVI / SVI routing not implemented in
vslib/vpp(sonic-buildimage #26936).
test_tunneling_between_sub_ports[*]— blocked by sairedis PR #1860 (IPinIP encap/decap; VPP 2510 enum
needs
MP2P->MPalias).test_routing_between_sub_ports_and_port[port_in_lag-l3],test_routing_between_sub_ports_unaffected_by_sub_ports_removal[port_in_lag-different],test_balancing_sub_ports[port_in_lag],test_max_numbers_of_sub_ports[port_in_lag].DUT-originated ARP on a LAG sub-port can still fail to install a
neighbor into the VPP sub-interface RIF in these stress / cross-port
cases. The
[port]counterparts PASS in the same run, isolating thebug to the LAG-specific plumbing (tc bridge or kernel → VPP
propagation on top of it). Tracked as a follow-up PR.
Any platform specific information?
Changes only touch
sonic-platform-vpp(docker-syncd-vpp/docker-sonic-vppconfig + HLD). No impact on other ASIC platforms.Both syncd and sonic images are updated symmetrically. The companion
sonic-sairedisPR is required to reproduce the PASS result.Documentation
Updated
docs/HLD/vpp-lag.md(new "LAG Subinterface" section, v0.3revision, status row) and
docs/HLD/SONICVPP-HLD.md(Subinterface rowaligned with current behaviour; new LAG Subinterface row).