Skip to content

Conversation

@akerouanton
Copy link

This flag should be used to indicate to libkrun that downstream network backend wants to receive and transmit the virtio-net header along with Ethernet frames.

Network backends using this flag can then forward unmodified headers to another VM or build a sensible virtio_net_hdr (e.g. with GSO fields correctly set) such that receiving VM handles GSO'd frames properly.

Copy link
Collaborator

@mtjhrc mtjhrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PR, it certainly makes sense to expose the header to interested backends.
I am curious though, could you share more detail, what backend you would use with this?

Please fix the mentioned confusion between stream/datagram sockets and features/flags. Also I guess it would be nice to also implement this for stream sockets too, but not strictly necessary.

Comment on lines 973 to 975
if (features & !(NET_ALL_FEATURES | NET_FLAG_INCLUDE_VNET_HEADER)) != 0 {
return -libc::EINVAL;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is incorrect on 2 levels:

  • as it stands this PR only adds support for this feature for unix datagrams, not streams (which this function is configuring) .
  • the NET_FLAG_INCLUDE_VNET_HEADER is a flag not a feature (as correctly in the krun_add_net_unixgram).

return -libc::EINVAL;
}

let include_vnet_header: bool = flags & NET_FLAG_INCLUDE_VNET_HEADER != 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As implied above, include_vnet_header, should always be false here, because it's not implemented for the stream socket.

Comment on lines 1031 to 1035
let backend = if let Some(path) = path {
VirtioNetBackend::UnixgramPath(path, send_vfkit_magic)
} else {
VirtioNetBackend::UnixgramFd(fd)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only implement the flag for the unixgram sockets, I think it would be better to put the bool inside the enum (like send_vfkit_magic is).

Note: I am not sure why send_vfkit_magic would not passed into VirtioNetBackend::UnixgramFd, that seems like a bug.  

@akerouanton
Copy link
Author

I am curious though, could you share more detail, what backend you would use with this?

I tried for a couple hours to get a kernel stacktrace to pinpoint exactly where's the issue and why this PR fixes it, but to no avail. Let me describe that with plain words instead, but let me know if you want something more concrete.

I don't plan to use this flag with any existing backend right now. I'm working on a container runtime that leverages https://github.com/containerd/nerdbox and has its own virtual switch that connects multiple VMs. Every VM gets one or more virtio-net devices, and each interface is connected to a bridge interface (i.e. ip link set master). That bridge has a veth connected to it too (and at some point, it'll get multiple of them but nerdbox doesn't support multiple-containers-per-VM yet).

With TSO enabled, VM-to-VM connections are working correctly — packets are GSO'd on the sending side, and GRO'd on the receiving side. However, container-to-container connections are broken. After some debugging, I realized that disabling GSO on the sending veth fixed the issue and that was probably caused by invalid GSO fields in virtio-net headers.

I believe that with the above topology GRO is supposed to happen at the veth level (although I might be wrong), and missing GSO fields would cause skbs to have some fields unset, preventing GRO to happen.

With this change, my virtual switch is forwarding virtio-net headers sent by the sending side without recomputing L4 checksums (which was required for VM-to-VM scenarios to work), and it fixes container-to-container communications too. I also think that this should help with host-to-[VM|container] communications as it'd allow to send large frames with partial checksums.

Please fix the mentioned confusion between stream/datagram sockets and features/flags. Also I guess it would be nice to also implement this for stream sockets too, but not strictly necessary.

Sorry for the nasty mistakes, let me fix them!

@akerouanton akerouanton force-pushed the pass-virtionet-hdr-to-backend branch 2 times, most recently from db5988b to 420da29 Compare December 23, 2025 17:38
This flag should be used to indicate to libkrun that downstream network
backend wants to receive and transmit the virtio-net header along with
Ethernet frames.

Network backends using this flag can then forward unmodified headers to
another VM or build a sensible virtio_net_hdr (e.g. with GSO fields
correctly set) such that receiving VM handles GSO'd frames properly.

Signed-off-by: Albin Kerouanton <albin.kerouanton@docker.com>
@akerouanton akerouanton force-pushed the pass-virtionet-hdr-to-backend branch from 420da29 to ab2fe8b Compare December 23, 2025 17:56
@akerouanton
Copy link
Author

@mtjhrc I added include_vnet_header to all network backends but to keep the bespoke 'frame length' header used by the unixstream backend, I had to extend NetWorker's TX buffer by 4 bytes. The vnet header is now located at tx_frame_buf[4..14].

Unixstream either writes the frame length to 0..4 or 10..14 dependening on whether the vnet header is included. Other backends have been modified to ignore the first four bytes.

The Tap backend also takes an include_vnet_header to keep it aligned with other backends. But since vnet headers have always been included, there's no way to make that configurable without introducing a breaking change in the API. Maybe something to fix in a major API version, or via a new API method, or maybe it's not worth it (in that case I can remove my change if you prefer).

Anyway, this should be in a better shape. PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants