Skip to content

Conversation

@gangj
Copy link
Contributor

@gangj gangj commented Jan 6, 2026

CA-384228: Enable TCP Path MTU Discovery by default

Add sysctl configuration to enable TCP PMTUD on all XenServer hosts.
This prevents TCP connection hangs when path MTU is smaller than configured
interface MTU (e.g., jumbo frames configured but network infrastructure
doesn't support them).

Configuration:

  • net.ipv4.tcp_mtu_probing=1: Enable automatic MTU detection when ICMP
    blackhole is detected (recommended setting)
  • net.ipv4.tcp_base_mss=1024: Base MSS for MTU probing

The TCP stack will automatically detect packet loss patterns indicating
MTU issues and probe down to find the working MTU size. This works even
when ICMP Fragmentation Needed messages are blocked by firewalls.

Files:

  • scripts/92-xapi-tcp-mtu.conf: New sysctl configuration file
  • scripts/Makefile: Install sysctl config to /etc/sysctl.d/

The "92" prefix ensures this loads after basic network configuration
(91-net-ipv6.conf) but before local administrator overrides (99-*).

Reference: https://blog.cloudflare.com/path-mtu-discovery-in-practice/

===

CA-384228: Add MTU diagnostics during pool join

Add diagnostic tests during pool join to detect and warn about MTU
mismatches, particularly when jumbo frames are configured but the
network path doesn't support them.

The diagnostics:

  1. Query master's management network MTU via RPC
  2. Test standard MTU (1472 bytes data) with ICMP ping
  3. Test jumbo frames (8972 bytes data) if MTU > 1500
  4. Log prominent warning when CA-384228 scenario detected:
    • Standard MTU works
    • Jumbo frames fail
    • This indicates path MTU < configured MTU

Key design decisions:

  • Does NOT block pool join (ICMP may be blocked by firewalls)
  • Queries master's DB (slave's DB not yet synced during join)
  • Called after RPC session established (need remote DB access)
  • Relies on TCP PMTUD to handle issues automatically
  • Diagnostics are informational only for visibility

Warning format highlights the issue clearly and references the
TCP PMTUD fix that handles it automatically, with guidance for
persistent problems.

Add sysctl configuration to enable TCP PMTUD on all XenServer hosts.
This prevents TCP connection hangs when path MTU is smaller than configured
interface MTU (e.g., jumbo frames configured but network infrastructure
doesn't support them).

Configuration:
- net.ipv4.tcp_mtu_probing=1: Enable automatic MTU detection when ICMP
  blackhole is detected (recommended setting)
- net.ipv4.tcp_base_mss=1024: Base MSS for MTU probing

The TCP stack will automatically detect packet loss patterns indicating
MTU issues and probe down to find the working MTU size. This works even
when ICMP Fragmentation Needed messages are blocked by firewalls.

Files:
- scripts/92-xapi-tcp-mtu.conf: New sysctl configuration file
- scripts/Makefile: Install sysctl config to /etc/sysctl.d/

The "92" prefix ensures this loads after basic network configuration
(91-net-ipv6.conf) but before local administrator overrides (99-*).

Reference: https://blog.cloudflare.com/path-mtu-discovery-in-practice/

Signed-off-by: Gang Ji <gang.ji@cloud.com>
(* Test MTU connectivity using ping - ICMP-based, informational only *)
let test_ping size desc =
try
let timeout = 3.0 *. 1e9 |> Int64.of_float |> Mtime.Span.of_uint64_ns in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Mtime.Span.(3 * s) or similar to create a 3s span.

match (standard_ok, jumbo_ok) with
| true, false ->
(* CA-384228 scenario: standard works but jumbo fails *)
warn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to create an alert such that the customer would see it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, I think it is a great idea, added now, please help to review again, thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alert shown in XC after pool join:
image

- 1472 = 1500 (standard MTU) - 20 (IP header) - 8 (ICMP header)
- 8972 = 9000 (jumbo MTU) - 20 (IP header) - 8 (ICMP header) *)
let standard_mtu_icmp_payload = 1472 in
let jumbo_mtu_icmp_payload = 8972 in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we calculate this dynamically based on the actual MTU like mtu - 28, and then try a probe for MTU=1500, and another for the actually configured MTU.
That way even if the user configures an MTU slightly smaller than 9000 it'd work.

Also do we need to take the size of the VLAN tag into account when we're on a VLAN?

@gangj gangj force-pushed the private/gangj/CA-384228 branch from 9d7d3a9 to 1d0e0d4 Compare January 8, 2026 08:43
Add diagnostic tests during pool join to detect and warn about MTU
mismatches, particularly when higher MTU values are configured but
the network path doesn't support them.

The diagnostics:
1. Query master's management network MTU via RPC
2. Detect VLAN configuration and account for 4-byte overhead
3. Calculate ICMP payload dynamically:
   MTU - IP header (20) - ICMP header (8) - VLAN (4 if present)
4. Test standard MTU (1500) with ICMP ping
5. Test configured MTU if > 1500
6. Create pool-level alert when CA-384228 scenario detected:
   - Standard MTU (1500) works
   - Configured higher MTU fails
   - This indicates path MTU < configured MTU

Key design decisions:
- Does NOT block pool join (ICMP may be blocked by firewalls)
- Queries master's DB via verified RPC (slave's DB not yet synced)
- Called after certificate exchange with verified connection
- Creates pool-level alert for customer visibility in XenCenter/CLI
- Relies on TCP PMTUD (enabled by sysctl) to handle issues automatically
- Diagnostics are informational only, providing visibility

The implementation dynamically calculates test packet sizes based on
actual configured MTU rather than assuming fixed values, making it
work correctly with any MTU configuration (not just jumbo frames).

Warning format highlights the issue clearly and references the
TCP PMTUD fix that handles it automatically, with guidance for
persistent problems.

Signed-off-by: Gang Ji <gang.ji@cloud.com>
@gangj gangj force-pushed the private/gangj/CA-384228 branch from 1d0e0d4 to 0d7e423 Compare January 8, 2026 09:16
# This is the starting point for MTU probing when enabled

net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_base_mss = 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration files has been a salient point of issues regarding user configuration in xcp-ng. I'm asking the platform teams whether this change follows their recommendations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @psafont , I understand your point.
Would you please share more about the good practice or recommendations? Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting somebody from xcp-ng's platform team to share them here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants