Skip to content

v1.0 magic#93

Open
totoCZ wants to merge 10 commits into
DanielAdolfsson:1.0-develfrom
totoCZ:master
Open

v1.0 magic#93
totoCZ wants to merge 10 commits into
DanielAdolfsson:1.0-develfrom
totoCZ:master

Conversation

@totoCZ
Copy link
Copy Markdown

@totoCZ totoCZ commented May 24, 2026

This makes v1 production ready.

totoCZ and others added 10 commits May 24, 2026 19:50
NUD probes (RFC 4861 §7.7.3) are sent unicast and SHOULD omit SLLAO
since the sender already knows the link-layer address.  The previous
code hard-returned when SLLAO was absent, so ndppd either dropped the
probe entirely or responded with an unsolicited NA (SOLICITED=0) to
ff02::1.  An unsolicited NA does not satisfy a NUD PROBE transition
(§7.3.5), causing the Juniper switch to declare the neighbor
unreachable.

Fix: pass the Ethernet source MAC through ndL_handle_ns and use it as
the fallback src_ll when SLLAO is absent.  This allows a properly
solicited NA to be sent back to the unicast probe source.

Also add NULL src_ll guards in session.c for the DAD (unspecified
source) path, and fix the PID file write check which compared the
return value of write() to 0 instead of the expected byte count,
logging a spurious error on every successful daemonize.
The use-kernel config option conditionally called RTM_NEWNEIGH/
RTM_DELNEIGH to maintain kernel neighbor proxy entries alongside
ndppd's own proxying.  It was never promoted beyond experimental and
is not used in any deployed config.  The underlying nd_rt_add_neigh/
nd_rt_remove_neigh helpers in rt.c are retained.
iface.c:
- Reject NS with multicast target address (RFC 4861 §7.1.1 MUST)
- Discard ND packet containing any option with length zero (RFC 4861 §4.6 MUST)
- Iterate all NS options to find SLLAO instead of only inspecting the first
- BSD/BPF: fix msg pointer to use per-packet offset (was always buf base,
  corrupting all field reads and silently dropping all but first BPF record)
- BSD/BPF: fix plen sanity check to use bpf_hdr->bh_caplen not total read len

session.c:
- STALE state: queue subscriber and trigger fresh NUD probe rather than
  immediately responding with OVERRIDE NA against unconfirmed reachability
  (RFC 4861 §7.3.3 — STALE means reachability is unknown)
- VALID state: refresh state_time on incoming NA so gratuitous NAs from the
  target extend session lifetime (RFC 4861 §7.2.5)
- STALE exponential backoff: guard nd_conf_retrans_limit == 0 modulo
  (config allows min=0, causing division-by-zero UB)
- STALE exponential backoff: cap shift at 20 to prevent signed int overflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PACKET_MR_ALLMULTI only enables all-multicast reception; unicast NS
frames addressed to a MAC other than the interface's own are dropped by
the NIC.  This silently breaks proxying whenever an external host holds
a stale neighbour-cache entry with an old container veth MAC and sends
its NUD probe as a unicast to that address — ndppd never sees the NS
and never replies.  Running tcpdump worked around the problem because it
sets PACKET_MR_PROMISC as a side-effect, making the NIC accept every
frame.

Switch to PACKET_MR_PROMISC (a strict superset of ALLMULTI) so that
all NS packets reach ndppd regardless of their Ethernet destination.
The existing BPF filter still limits userspace delivery to ICMPv6
NS/NA only, so there is no meaningful performance impact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ode)

In auto mode, if no route to the container exists when the first NS
arrives, nd_session_create() sets state=INVALID immediately.  All
subsequent NS from the same requester add subscribers to this INVALID
session, but nd_session_update() for INVALID only purges them after
invalid_ttl — it never retries the probe.  The external host gives up
long before the TTL expires, producing the symptom of "no reply on new
container start."

When an NS arrives for an INVALID session that has no interface (i.e.
went invalid because of a missing route, not a failed probe), re-check
the routing table.  If the route has appeared in the meantime, open the
downstream interface, transition to INCOMPLETE, and start probing so
queued subscribers get an answer once the container replies with NA.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nd_iface_send_ns() was building the solicited-node Ethernet multicast
destination as 33:33:tgt[12:15] — i.e. taking all four last bytes of
the TARGET address.  For EUI-64-derived container addresses, byte 12 is
always 0xfe (the lower byte of the ff:fe EUI-64 marker), so ndppd was
sending to 33:33:fe:XX:XX:XX instead of the correct 33:33:ff:XX:XX:XX.

The solicited-node multicast Ethernet MAC is derived from the last four
bytes of the IPv6 *destination* (ff02::1:ffXX:XXXX), not the target.
That address has 0xff at byte 12, not 0xfe.  The correct MAC is always
33:33:ff:tgt[13]:tgt[14]:tgt[15].

Practical impact: containers with EUI-64 addresses never received
ndppd's NS probe, so they never replied with NA, and ndppd's iface-mode
sessions stayed INCOMPLETE indefinitely.  Running tcpdump happened to
put the bridge/veth into promiscuous mode, which flooded the wrong-MAC
frame to all ports and accidentally made it work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The -v verbosity flag was comparing against ND_LOG_ERROR (0) — always
false — so it never did anything.  Fix the comparison to ND_LOG_TRACE.
Default verbosity changed from TRACE to INFO so -vvv reaches trace.

Add nd_log_debug/trace at every silent return path so packet drops are
visible in -vvv output:
  - unknown ifindex, short frame, ethertype, plen mismatch in io handler
  - hop-by-hop truncation, non-ICMPv6 nxt, ICMPv6 checksum mismatch,
    hop-limit != 255 in ndL_handle_msg
  - non-proxy iface, short NS, multicast target in ndL_handle_ns
  - no rule match in nd_proxy_handle_ns
  - no session match for incoming NA in ndL_handle_na
  - no src_ll (DAD / no SLLAO) in nd_session_handle_ns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After fork(), both parent and child share the same AF_PACKET socket.
The parent's atexit handler (nd_iface_cleanup → nd_iface_close) was
calling PACKET_DROP_MEMBERSHIP on the shared socket before the parent
exited, silently removing the PROMISC membership that the child daemon
still needed.  Result: the daemon started without promiscuous mode and
missed multicast NS packets until tcpdump happened to re-enable promisc.

nd_iface_no_restore_flags already existed for exactly this purpose but
was never set.  Set it in the parent branch of ndL_daemonize() before
exit(0), and guard the PACKET_DROP_MEMBERSHIP call in nd_iface_close()
behind the flag so the parent's cleanup is a no-op on the socket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant