Skip to content

[linux-cp/dpdk] Fix sub-port and LAG sub-port datapath on sonic-platform-vpp#237

Open
lunyue-ms wants to merge 7 commits into
sonic-net:masterfrom
lunyue-ms:lunyue/vpp-sub-port-fixes
Open

[linux-cp/dpdk] Fix sub-port and LAG sub-port datapath on sonic-platform-vpp#237
lunyue-ms wants to merge 7 commits into
sonic-net:masterfrom
lunyue-ms:lunyue/vpp-sub-port-fixes

Conversation

@lunyue-ms
Copy link
Copy Markdown

Description

Description of PR

Summary:
Three platform-level fixes that, together with the companion
sonic-sairedis PR, restore the sub-port and LAG-sub-port datapath
on sonic-platform-vpp. After this PR, PTF
test_sub_port_interfaces.py::TestSubPorts::test_packet_routed_with_valid_vlan
passes on vlab-vpp-01 for both [port] and [port_in_lag]
parametrisations.

Fixes: sonic-net/sonic-platform-vpp#227

Companion PR:

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Documentation update
  • Test improvement

Approach

What is the motivation for this PR?

On sonic-platform-vpp, three independent issues keep the sub-port
datapath broken: VPP linux-cp holds the sub-port host netdev DOWN after
each itf-pair (re)create, DPDK virtio ports come up non-promiscuous and
drop broadcast ARP on freshly-created L3 sub-ports, and
lcp-auto-subint races sairedis to install the LAG sub-port LCP pair
(producing the wrong host tap). Together with the companion sairedis
changes, fixing these three restores sub-port and LAG sub-port
datapath end-to-end.

How did you do it?

Changes are confined to docker-syncd-vpp / docker-sonic-vpp config
and docs/HLD/:

  1. lcp-sync enabled — VPP propagates sub-interface admin state to
    the host tap; without it linux-cp keeps the host netdev DOWN after
    itf-pair (re)create, even if IntfMgr already ran ip link set up.
  2. DPDK promisc reconciler — dedicated supervisord program
    vpp_promisc.sh polls for DPDK port appearance and runs
    vppctl set interface promiscuous on bobm*. Real ASICs are
    implicitly promiscuous on routed ports; VPP's startup-config
    directive cannot fix this because it runs before the DPDK plugin
    initialises ports.
  3. lcp-auto-subint disabled — sairedis is now the sole producer
    of sub-port LCP pairs and atomically wires up
    configure_lcp_interface(BondEthernet<id>.<vlan>, be<id>.<vlan>)
    plus the bidirectional tc mirred bridge between be<id>.<vlan>
    and the SONiC L3 netdev PortChannel<id>.<vlan> in the same SAI
    RIF create.
  4. Promisc reconciler hardening — phase-1 timeout exits 0 with
    autorestart=unexpected (no respawn storm on virtio-only platforms),
    and a whitespace-tolerant regex avoids re-issuing
    set promiscuous on every 2s on minor VPP output format changes.
  5. HLD docsvpp-lag.md gains a new "LAG Subinterface" section
    describing the 3-tuple naming (PortChannel<id>.<vlan> /
    BondEthernet<id>.<vlan> / be<id>.<vlan>) and the bidirectional
    tc mirred bridge; SONICVPP-HLD.md adds a LAG Subinterface row
    to the "Interface Create Flow by Type" table.

How did you verify/test it?

Result: 22 passed / 10 failed in ~32 min.

Failure breakdown:

  • 6 upstream-blocked, out of this PR's scope:
    • SVI variants (×2): test_routing_between_sub_ports_and_port[*-svi]
      — blocked by VLAN BVI / SVI routing not implemented in vslib/vpp
      (sonic-buildimage #26936).
    • IPinIP variants (×4): test_tunneling_between_sub_ports[*]
      — blocked by sairedis PR #1860 (IPinIP encap/decap; VPP 2510 enum
      needs MP2P->MP alias).
  • 4 LAG-sub-port follow-up, also out of scope:
    test_routing_between_sub_ports_and_port[port_in_lag-l3],
    test_routing_between_sub_ports_unaffected_by_sub_ports_removal[port_in_lag-different],
    test_balancing_sub_ports[port_in_lag],
    test_max_numbers_of_sub_ports[port_in_lag].
    DUT-originated ARP on a LAG sub-port can still fail to install a
    neighbor into the VPP sub-interface RIF in these stress / cross-port
    cases. The [port] counterparts PASS in the same run, isolating the
    bug to the LAG-specific plumbing (tc bridge or kernel → VPP
    propagation on top of it). Tracked as a follow-up PR.

Any platform specific information?

Changes only touch sonic-platform-vpp (docker-syncd-vpp /
docker-sonic-vpp config + HLD). No impact on other ASIC platforms.
Both syncd and sonic images are updated symmetrically. The companion
sonic-sairedis PR is required to reproduce the PASS result.

Documentation

Updated docs/HLD/vpp-lag.md (new "LAG Subinterface" section, v0.3
revision, status row) and docs/HLD/SONICVPP-HLD.md (Subinterface row
aligned with current behaviour; new LAG Subinterface row).

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@lunyue-ms lunyue-ms force-pushed the lunyue/vpp-sub-port-fixes branch from b4d1855 to 30f3586 Compare May 26, 2026 01:42
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@lunyue-ms lunyue-ms requested a review from yue-fred-gao May 26, 2026 06:20
Comment thread docker-sonic-vpp/conf/vpp_promisc.conf Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. vpp has api to do that. in vlan-bvi HLD (#228), it has this.

4.7 Promiscuous Mode on Every Physical Interface

File: SwitchVppHostif.cpp

At LCP pair creation for each physical port:

configure_lcp_interface(hwif_name, dev, true);
interface_set_promiscuous(hwif_name, true);  // <-- added

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. interface_set_promiscuous() is listed in HLD #228 but isn't in master yet — the underlying VPP API sw_interface_set_promisc exists, so it's a thin shim.

I'll add it into sairedis PR #1907: add the wrapper, call it from SwitchVppHostif::createHostif after configure_lcp_interface(), and remove vpp_promisc.sh from this PR. Do you agree?


# Why not support sub interfaces
linux-cp {
lcp-auto-subint
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we disable it, vpp sai needs to create sub interface for regular Ethernet interface. Is this in the plan?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sonic-net/sonic-sairedis#1907, In SwitchVppRif.cpp::vpp_create_router_interface():
int ret = create_sub_interface(parent_hwif, vlan_id, vlan_id);
if (ret != 0) { return SAI_STATUS_FAILURE; }

 ret = configure_lcp_interface(hw_subifname, lcp_host, true);
 if (ret != 0) {
     delete_sub_interface(parent_hwif, vlan_id);   // rollback
     return SAI_STATUS_FAILURE;
 }

For a regular Ethernet sub-port (e.g. Ethernet0.10) this will be:
create_sub_interface("bobm0", 10, 10)
configure_lcp_interface("bobm0.10", "Ethernet0.10", true)

So I think that it covers to create sub interface for regular Ethernet interface.

# Why not support sub interfaces
linux-cp {
lcp-auto-subint
# lcp-auto-subint disabled: auto-pair "BondEthernet<id>.<vlan>" <-> "be<id>.<vlan>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big change. Please make sure you run full sonic-mgmt tests to avoid breakage

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. I wil run a roll of ful regression tests to make sure it doesn't break any other tests.

Copy link
Copy Markdown
Author

@lunyue-ms lunyue-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yue-fred-gao for your comments. Please check if my thoughts is feasible.


# Why not support sub interfaces
linux-cp {
lcp-auto-subint
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sonic-net/sonic-sairedis#1907, In SwitchVppRif.cpp::vpp_create_router_interface():
int ret = create_sub_interface(parent_hwif, vlan_id, vlan_id);
if (ret != 0) { return SAI_STATUS_FAILURE; }

 ret = configure_lcp_interface(hw_subifname, lcp_host, true);
 if (ret != 0) {
     delete_sub_interface(parent_hwif, vlan_id);   // rollback
     return SAI_STATUS_FAILURE;
 }

For a regular Ethernet sub-port (e.g. Ethernet0.10) this will be:
create_sub_interface("bobm0", 10, 10)
configure_lcp_interface("bobm0.10", "Ethernet0.10", true)

So I think that it covers to create sub interface for regular Ethernet interface.

# Why not support sub interfaces
linux-cp {
lcp-auto-subint
# lcp-auto-subint disabled: auto-pair "BondEthernet<id>.<vlan>" <-> "be<id>.<vlan>"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. I wil run a roll of ful regression tests to make sure it doesn't break any other tests.

Comment thread docker-sonic-vpp/conf/vpp_promisc.conf Outdated
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. interface_set_promiscuous() is listed in HLD #228 but isn't in master yet — the underlying VPP API sw_interface_set_promisc exists, so it's a thin shim.

I'll add it into sairedis PR #1907: add the wrapper, call it from SwitchVppHostif::createHostif after configure_lcp_interface(), and remove vpp_promisc.sh from this PR. Do you agree?

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

lunyue-ms and others added 7 commits May 31, 2026 23:36
Sub-port host kernel netdevs (e.g. Ethernet64.20, PortChannel1.20) were
created in DOWN state because VPP linux-cp plugin sets host-side link
state to DOWN by default when the lcp itf-pair is created, overriding
the prior 'ip link set <subif> up' that IntfMgr executed. As a result,
host kernel ignored incoming sub-port traffic and PTF-driven sub-port
tests (e.g. test_packet_routed_with_valid_vlan) failed because the host
ICMP stack never replied to echo-request.

Enable 'lcp-sync' in the linux-cp startup configuration so that VPP
propagates sub-interface admin state changes to the corresponding host
netdev. When sairedis brings the VPP sub-interface up via SAI API,
lcp-sync now auto-ups the host counterpart, restoring the standard
sub-port datapath.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
VPP DPDK virtio ports default to 'unicast off all-multicast on', which
drops broadcast frames such as ARP requests targeting newly-created L3
sub-port interfaces. Real ASICs are implicitly promiscuous for routed
ports, so PTF tests that rely on post-boot sub-port creation fail at
the very first ARP from the peer.

Manual reproduction on vlab-vpp-01:

- Create Ethernet64.40 with IP 172.16.0.9/30
- From PTF eth16.40 (172.16.0.10) ping 172.16.0.9
- 0 packets arrive at bobm16 (verified via vppctl trace)
- After 'vppctl set interface promiscuous on bobm16', ping succeeds

VPP's startup-config directive cannot fix this because CLI commands run
*before* the DPDK plugin finishes initialising its ports, so
'set interface promiscuous on bobm0' silently fails at that point.

A previous attempt forked a background subshell from vpp_init.sh to
wait for VPP CLI and re-apply the CLI commands. That subshell did not
survive supervisord's process-group handling, so promisc was never
actually enabled at runtime. Worse, the syncd container is restarted
by sonic-mgmt PTF fixtures during test setup ('config reload') -- any
one-shot solution loses its effect after that restart.

This change replaces the fragile subshell with a dedicated supervisord
program 'vpp_promisc.sh' that:

  1. Waits up to 180s for the VPP CLI to come up and at least one
     bobm[0-9]+ interface to appear.
  2. Iterates every discovered bobm* DPDK port and enables promisc.
  3. Stays alive in a 2s monitor loop so promisc is re-applied after
     in-container VPP restarts.

The new program depends on vpp_init.sh:running via
supervisord_dependent_startup, mirroring the existing pattern used by
syncd / rsyslogd / start. Supervisord auto-restarts the script if it
exits non-zero (e.g. timeout waiting for bobm0).

Applied to both docker-syncd-vpp and docker-sonic-vpp.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
…ision

When VPP linux-cp lcp-auto-subint is on, VPP auto-generates host tap
"be<id>.<vlan>" for BondEthernet<id>.<vlan>, but SONiC IntfMgr creates
"PortChannel<id>.<vlan>" (teamd vlan child) as the actual host netdev
that holds the L3 IP. The two netdevs are different, so ARP replies
land on the auto-tap, not on PortChannel<X>.<vlan>, causing addNeighbor
0/8 success rate for sub-port-over-LAG.

Disable lcp-auto-subint so sairedis is the sole producer of sub-port
LCP pairs and can explicitly call
configure_lcp_interface(BondEthernet<id>.<vlan>, PortChannel<id>.<vlan>)
in SwitchVpp::vpp_create_router_interface for SAI_ROUTER_INTERFACE_TYPE_SUB_PORT.

Verified on vlab-vpp-01 PTF test_packet_routed_with_valid_vlan:
- LCP pair table now shows
  itf-pair: [46] BondEthernet1.20 tap4137.20 PortChannel1.20
- No more 'configure_lcp_interface ... returned -81 EEXIST'
- [port] sub-test still PASS (no regression on non-LAG sub-port)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
Two related hardening fixes to vpp_promisc.sh / vpp_promisc.conf
applied symmetrically to docker-syncd-vpp and docker-sonic-vpp:

1. Restart storm on missing DPDK ports (phase-1 timeout).
   The script previously exited 1 after 180s of polling if no
   bobm* interface ever appeared. Combined with autorestart=true
   and startsecs=5, supervisord respawned the script forever:
   180s wait + 'timed out' log + restart, every cycle. Two
   changes:
     - phase-1 timeout now exits 0 with a single explanatory
       log line ('no DPDK ports to manage')
     - autorestart=true -> autorestart=unexpected, so a clean
       exit leaves the program in EXITED state and only real
       crashes (segfault, bash bug) get restarted.

2. Fragile promisc state check.
   'grep -q "promiscuous: unicast on"' matches an exact
   substring. Minor format changes between VPP releases (extra
   space, key reordering) would silently break the 'already on'
   check and cause vppctl to re-issue 'set promiscuous on' every
   2s. Use a whitespace-tolerant regex
   'promiscuous:[[:space:]]+unicast[[:space:]]+on' instead.

No functional change for the happy path (DPDK present, VPP
recent): script still polls every 2s and enables promiscuous on
new bobm* ports as before.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
Document the LAG sub-port (PortChannel<id>.<vlan>) plumbing introduced
in sonic-sairedis PR #1907 and the matching platform-vpp changes
(startup.conf 'lcp-auto-subint = disabled', vpp_promisc.sh monitor loop).

vpp-lag.md:
- TOC, Revisions (v0.3) and Scope: add 'LAG Subinterface' entries.
- New section 'LAG Subinterface' explaining the 3-tuple naming
  (PortChannel<id>.<vlan> / BondEthernet<id>.<vlan> / be<id>.<vlan>),
  the bidirectional 'tc mirred' bridge between LCP host and SONiC L3
  netdev, why 'lcp-auto-subint=disabled' is required, and the RIF
  IP / ARP handling (idempotent -105, arp_accept=1, neighbor pre-warm).
- Status row updated to reflect 'Hwif naming + explicit LCP pair +
  tc bridge + RIF IP idempotency + arp_accept/pre-warm + basic
  [port_in_lag] PTF PASS', with a follow-up note for DUT-originated
  ARP on a few PTF cases (root cause to be diagnosed).

SONICVPP-HLD.md:
- Subinterface row: align with current behaviour (explicit
  configure_lcp_interface from sairedis with 'lcp-auto-subint=disabled').
- New LAG Subinterface row covering BondEthernet<id>.<vlan> +
  be<id>.<vlan> LCP pair and the bidirectional tc bridge to
  PortChannel<id>.<vlan>.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
Both docker-syncd-vpp and docker-sonic-vpp now disable lcp-auto-subint and
carry the same rationale comment, so sairedis is the sole producer of LAG
sub-port LCP pairs in either deployment model (multi-container VM image
or single-container sonic-vpp image).

Also rewrites the existing rationale comment in docker-syncd-vpp to be
more accurate: configure_lcp_interface pairs BondEthernet<id>.<vlan> with
be<id>.<vlan> (not with PortChannel<id>.<vlan>); the bridge to
PortChannel<id>.<vlan> is a separate tc mirred step in the same SAI RIF
create. ARP replies do reach the correct LCP host tap; the kernel drops
them because the IP lives on a different netdev.

Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Promiscuous mode on the VPP DPDK physical interfaces is now configured
directly through the sairedis VPP API (interface_set_promiscuous, called
at LCP pair creation and after bond member enslave) instead of the
supervisord-managed vpp_promisc.sh polling script.

Remove the now-redundant vpp_promisc.sh / vpp_promisc.conf and their COPY
lines from both docker-sonic-vpp and docker-syncd-vpp.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Lun Yue <17232861+lunyue-ms@users.noreply.github.com>
@lunyue-ms lunyue-ms force-pushed the lunyue/vpp-sub-port-fixes branch from 1e0deb0 to ce80ca0 Compare May 31, 2026 13:36
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PortChannel/LAG subinterfaces are not working on SONiC-VPP

3 participants