Skip to content

viona: support (enough) VIRTIO_NET_F_CTRL_RX to leave all-multicast mode#1125

Open
FelixMcFelix wants to merge 4 commits intomasterfrom
felixmcfelix/viona-guest-promisc
Open

viona: support (enough) VIRTIO_NET_F_CTRL_RX to leave all-multicast mode#1125
FelixMcFelix wants to merge 4 commits intomasterfrom
felixmcfelix/viona-guest-promisc

Conversation

@FelixMcFelix
Copy link
Copy Markdown
Contributor

@FelixMcFelix FelixMcFelix commented Apr 24, 2026

This PR adds support for the VIRTIO_NET_F_CTRL_RX capability, which allows guests to request specific MAC address filters in addition to their primary MAC, and to explicitly inform the host whether the device should be shifted into promiscuous mode.

The hope is that this allows us to take a guest out of all-multicast promiscuous mode in viona itself, which generates extra per-packet work (checking "is this packet multicast?") and contention in the Tx and Rx paths as both reach for and refcount the same list of promisc callbacks on the client.

Testing in falcon makes it look as though Debian will just immediately insert multicast entries, although I don't know if that's purely an artefact of my environment. Using one of the helios-dev images appears to correctly move out of promiscuous mode and keep the filter tables empty.

Closes #1122. Closes #1127.

@FelixMcFelix FelixMcFelix changed the title viona: support (enough) VIRTIO_NET_F_CTRL_RX to remove promisc viona: support (enough) VIRTIO_NET_F_CTRL_RX to leave all-multicast mode Apr 24, 2026
@FelixMcFelix
Copy link
Copy Markdown
Contributor Author

I've tested this so far on Alpine Linux 3.23.3, debian 13.2, and the helios-2.8 image we use for a4x2 testing. The first two are multiqueue-capable, whereas illumos is not -- control requests appear to be correctly served on queue 22 for the former and queue 2 for the latter.

The typical initialisation sequence looks to be that guests will send explicit promisc off signals for alongside every MAC filter update. snoop -rd vioif0 will also have illumos explicitly move us in and out of promiscuous mode.

Running on top of OPTE and/or in omicron standalone doesn't really stop Linux from asking for multicast MACs (i.e., they're not a result of anything coming in link-local). So for most guests we will need to get explicit MAC filter support into viona proper, or they'll just put themselves back into all-multicast mode. illumos, at least, keeps the tables empty. On Debian, I'm seeing:

Got MAC list [
    MacAddr([33, 33, 0, 0, 0, 1]), MacAddr([1, 0, 5e, 0, 0, 1]), MacAddr([1, 80, c2, 0, 0, 0]),
    MacAddr([1, 80, c2, 0, 0, 3]), MacAddr([1, 80, c2, 0, 0, e]), MacAddr([33, 33, ff, e7, 9c, e4]),
    MacAddr([33, 33, 0, 1, 0, 3]), MacAddr([1, 0, 5e, 0, 0, fc]), MacAddr([33, 33, ff, 0, 6, da])
]

With visible group subscriptions on:

IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      224.0.0.1
enp0s6          1      224.0.0.252
enp0s6          1      224.0.0.1
lo              1      ff02::1
lo              1      ff01::1
enp0s6          1      ff02::1:ff00:6da
enp0s6          1      ff02::1:3
enp0s6          2      ff02::1:ffe7:9ce4
enp0s6          2      ff02::1
enp0s6          1      ff01::1

So the MAC table can be decoded as:

IPv6(ff01::1 [All Nodes, Node Local], ff02::1 [All Nodes, Link Local]),
IPv4(224.0.0.1 [All Systems on Subnet]),
STP/LLDP(additional),
LLDP(additional),
LLDP(primary)/PTP,
IPv6(ff02::1:ffe7:9ce4 [solicited node addr]),
IPv6(ff02::1:3 [LLMNR]),
IPv4(224.0.0.252 [LLMNR]),
IPv6(ff02::1:ff00:6da [solicited node addr])

ip link set multicast off dev enp0s6 still leaves a bunch of active MACs including registered multicast addresses for all-/solicited-node, but those are sort of intrinsic to the stack. I think I'm happy at least that we're at parity with how cbhyve uses this machinery in pci_viona_eval_promisc.

@FelixMcFelix FelixMcFelix marked this pull request as ready for review April 29, 2026 11:28
Comment on lines -376 to -378
if let Some(ctlq) = queues.get(2) {
ctlq.set_control();
}
Copy link
Copy Markdown
Contributor Author

@FelixMcFelix FelixMcFelix Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what effect putting receiveq2 in this state would have, but we never unset this flag when the control queue index updated. So in theory this would have affected ring_init/reset/kick/... for this queue?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... among other things we'd entirely fumble the queue state across migration. pretty bad, thanks for noticing this.

Comment on lines +1331 to +1342
let input: migrate::VionaStateV1 = offer.take()?;

let mut state = self.inner.lock().unwrap();
state.filter = FilterState::from_bits_retain(input.filter);
state.unicast_mac_filters = input.unicast_mac_filters.into();
state.unicast_mac_filters = input.multicast_mac_filters.into();
self.set_promisc(input.promisc, &mut state).map_err(|_| {
MigrateStateError::ImportFailed(format!(
"Could not move device promisc level to {:?}.",
input.promisc
))
})
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty unfamiliar with the propolis migration machinery, hopefully this does the job?

Copy link
Copy Markdown
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the slow turnaround on this review! a couple of actual issues, a couple of me-trying-to-update-my-mental-model :)

Comment on lines +210 to +211
let mut space =
vec![MacAddr::default(); entry_count as usize].into_boxed_slice();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a guest has told us it will provide u32::MAX MACs we will do a truly unfortunate allocation here. it seems to me like we want a read_many_owned() here similar to the one that will imminently come in for src/vmm/mem.rs in #1105? (for reference)

if not a read_many_owned on VirtQueue here, lets please have an

if entry_count * mem::size_of::<MacAddr>() > chain.remain_read_bytes() {
    return Err(());
}

here?

(truly not pressed on which we go with in this change, but if you defer VirtQueue:read_many_owned please file an issue and I'll probably get to it)

}

bitflags! {
/// Packet receive filters requested by a virtio NIC driver.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be right to add here something like: "filter configuration is only a binary state, and the device is a composition of those filter configurations. Note that VirtIO RX commands only operate on individual filters using consecutive integers and correspond to bit numbers here only for simplicity"?

let mut state = self.inner.lock().unwrap();
state.filter = FilterState::from_bits_retain(input.filter);
state.unicast_mac_filters = input.unicast_mac_filters.into();
state.unicast_mac_filters = input.multicast_mac_filters.into();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
state.unicast_mac_filters = input.multicast_mac_filters.into();
state.multicast_mac_filters = input.multicast_mac_filters.into();

let input: migrate::VionaStateV1 = offer.take()?;

let mut state = self.inner.lock().unwrap();
state.filter = FilterState::from_bits_retain(input.filter);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely should be a FilterState::from_bits() and MigrateStateError::ImportFailed if we got a bit we don't know how to process in the input filter.

(a new bit would probably need to imply a VionaStateV2 payload being offered so we should really never see such an error)

pub const VIRTIO_NET_CFG_SIZE: usize = 6 + 2 + 2 + 2 + 4 + 1 + 1 + 2 + 4;
}
use bits::*;
use zerocopy::{FromBytes, Immutable, IntoBytes};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: editor get confused about the use ... most of the way through this file? oughtta have this up at around line 35.

.ok()
.and_then(|i| self.virtio_state.queues.get(i))
.map(|v| v.set_control())
.ok_or_else(|| ())?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on one hand, this is all well within its rights to be expectful: the queue index should be in-range from above so try_into and get should both always succeed.

on the other.. queues.get() has a special-case for max_capacity() - 1 in which case it returns get_control() getting us a "control" queue which we have not actually made the control queue yet? kinda odd!

leaning further into the XXX: in VirtQueues::get maybe we should have set_len() update some queues-internal state that says who is currently the control queue (and unsets is_control if there was a previous one). then we avoid this whole "VirtQueues and virtio-nic have their own ideas of control queues" problem.

comparing with other VirtIO devices it seems like in the longer-run the question of "is this a control queue" needs to be answered wholly by the device impl, and that it's not really coherent as a generic property (some devices have 0 queues, some have two, most of the time they're the first/second except here, ... etc). so keeping this in VirtQueues is an OK step until we have other control-queue-having VirtIO devices?

Comment on lines +1030 to +1031
let need_mcast = state.filter.contains(FilterState::ALL_MULTICAST)
|| !state.multicast_mac_filters.is_empty();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please bear with my rusty networking brain: if the filter has ALL_MULTICAST | NO_MULTICAST then do we still need_mcast? I'm not sure how broadcast fits in, is really the question. it seems more obvious to me that we shouldn't need_mcast if ALL_MULTICAST | NO_MULTICAST | NO_BROADCAST. in any case, having viona more promiscuous than required is not a correctness issue iiuc, so we don't need to worry about these cases?

(if that's right, imo worth a note on fn needed_promisc and I'm not worried about not filtering as much as silly combinations of filters could permit)

Comment on lines +988 to +994
match self.set_promisc(self.needed_promisc(&state), &mut state) {
Ok(()) => Ok(()),
Err(()) => {
state.unicast_mac_filters = unicast;
state.multicast_mac_filters = multicast;
Err(())
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm missing.. two things. if set_promisc errors, do we in general know anything about the state of viona? I think we would want that to result in the device having the status DEVICE_NEEDS_RESET, but right now this only results in an Ack::Err. I don't know if this is consistent with what cbhyve or other virtio-nic devices do.

the other thing being: shouldn't I expect to see somewhere we communicate these filters to viona? or no, and this is all fine because there's no guarantee these are the only packets a guest would see, just the hope that the NIC/VNIC have prevented a guest from having to see them?


match self.set_promisc(self.needed_promisc(&state), &mut state) {
Ok(()) => Ok(()),
Err(()) => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to try rolling back state on error, should we try to self.set_promisc(prev_promisc) in the error case here? maybe failing to do that is what should produce a severe enough error to NEEDS_RESET the deice?

state.promisc = level;
Ok(())
}
Err(_) => Err(()),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question on if we should set the old promisc level in the error case here. mostly looking at viona_rx_set in illumos and noticing we can mac_rx_set() and then error (in multi) or mac_rx_clear() and then error (in all), so set_promisc failing implies we've got viona in a really wonky state.

Comment on lines +53 to +58
/// The index of the control queue when multiqueue ([`VIRTIO_NET_F_MQ`]) has
/// not been negotiated.
///
/// In this case, the driver will behave as though we have allocated only one
/// Rx/Tx queue pair, followed by the control queue.
pub const VIRTIO_NO_MQ_CTRL_Q_INDEX: usize = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw part of the problem with picking 2 here on our own is that VirtQueues::iter() will still use get_control() to tack on a control queue at the end, so reset_queues_locked() will reset queues 0, 1, and propolis-VirtQueue-22 and I think leave queue 2 as it was before the reset. that's what has me thinking about keeping the notion of control-ness totally internal to VirtQueues for now, rather than splitting it between there and virtio/viona.rs.

Comment on lines +678 to +685
let filter = match cmd {
RxCmd::Promisc => FilterState::PROMISCUOUS,
RxCmd::AllMulticast => FilterState::ALL_MULTICAST,
RxCmd::AllUnicast => FilterState::ALL_UNICAST,
RxCmd::NoMulticast => FilterState::NO_MULTICAST,
RxCmd::NoUnicast => FilterState::NO_UNICAST,
RxCmd::NoBroadcast => FilterState::NO_BROADCAST,
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awful Rust trivia: this does not optimize into the 1 << RxCmd one might hope, it becomes a movabs long_imm, reg; shr reg, RxCmd. this is better w.r.t being legible source, but the compiler fully mangles the cute bitwise trick :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Viona control queue index is incorrect Viona links never leave VIONA_PROMISC_MULTI

2 participants