Skip to content

Conversation

@dancrossnyc
Copy link
Contributor

No description provided.

@dancrossnyc dancrossnyc requested review from citrus-it, iximeow and papertigers and removed request for papertigers December 23, 2025 02:17
@dancrossnyc
Copy link
Contributor Author

Cc @papertigers @rcgoodfellow

@dancrossnyc dancrossnyc force-pushed the virtio-modern branch 2 times, most recently from e7b1927 to 59f9711 Compare December 23, 2025 21:18
@askfongjojo askfongjojo added this to the 18 milestone Dec 29, 2025
Copy link
Contributor

@citrus-it citrus-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this mostly from a VirtIO/viona perspective.

}
}

impl MigrateMulti for PciVirtioViona {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we'll need to push at least the number of selected queue pairs into the migration state here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do; that information is bundled up into the VirtioQueues type and its migration machinery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting to see both 'max pairs' and 'use pairs' in the state so we know what to pass to viona on resume/migration. Will we end up setting the right number of pairs in that case based on negotiated features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is implicitly encapsulated in the distinction between the vector size and length in VirtQueues.

@papertigers papertigers self-requested a review December 30, 2025 20:58
@dancrossnyc dancrossnyc force-pushed the virtio-modern branch 5 times, most recently from 9586765 to 2c7aa84 Compare January 6, 2026 19:14
@citrus-it citrus-it self-requested a review January 7, 2026 11:22
@dancrossnyc dancrossnyc force-pushed the virtio-modern branch 5 times, most recently from 67bbaff to bf65fd9 Compare January 8, 2026 01:38
Copy link
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through most of this except for queue.rs and part of viona.rs but I've got to task switch so no reason hold the half-finished review. generally, thanks for getting this together, and apologies for the forever delay in review.

Comment on lines +139 to +141
/// XXX: Check these mappings against some reference.
pub fn pci_sub_dev_id(self) -> u16 {
self as u16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VirtIO 1.2 section 4.1.2.1 says:

Non-transitional devices SHOULD have a PCI Subsystem Device ID of 0x40 or higher

so this at least wants to branch on Mode::Modern and return self as u16 + 0x40 in that case, it seems?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went back and forth a bit on this In c-bhyve and ended up just setting this to the same as the device ID for legacy, transitional and modern devices. It fits the spirit of the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this one's a bit of a pickle. As Andy said, c-bhyve does it the way we do, and it seems valuable to maintain parity with it; on the other hand, the spec does contain that wording. In practice, I don't think any drivers really care. I find myself falling on the side of wanting to maintain parity with c-bhyve here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is doing it like c-bhyve at the moment.

What we want is this (for a network device - ID 1, dev 0x1000

Legacy/Transitional:

  • Vendor: 0x1af4
  • Device: 0x1000
  • Subvendor: 0x1af4
  • Subdevice: 0x1
  • PCI Revision: 0x0

Modern:

  • Vendor: 0x1af4
  • Device: 0x1041
  • Subvendor: 0x1af4
  • Subdevice: 0x1041
  • PCI Revision: 0x1

My inference from the spec wording is that we don't want, say, a modern network device having 0x1af4,1 in the subdevice fields to reduce the change of a guest attaching a legacy driver based on that. Anything >= 0x40 achieves it.

Comment on lines +112 to +117
/// However, for legacy and transitional mode devices, the mapping is
/// irregular, and defined by a table. Moreover, not all virtio devices
/// have a well-defined transitional mapping. This last thing is not
/// really an issue for us, since we only expose a handful of device
/// models; regardless, we provide mappings for everything defined in
/// the VirtIO spec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wording nit, because when I first read this I missed the "not all have a well-defined transitional mapping" -> "so some are defined ad-hoc" implication. and pedantically, going by the VirtIO spec won't be sufficient since I assume we'll want DeviceId::Socket.pci_dev_id(Mode::Legacy) when Mike's at that point too?

Suggested change
/// However, for legacy and transitional mode devices, the mapping is
/// irregular, and defined by a table. Moreover, not all virtio devices
/// have a well-defined transitional mapping. This last thing is not
/// really an issue for us, since we only expose a handful of device
/// models; regardless, we provide mappings for everything defined in
/// the VirtIO spec.
/// However, for legacy and transitional mode devices, the mapping is
/// irregular, and defined by a table. The table describes the subset of
/// VirtIO devices that typically support operation by legacy drivers. Even
/// then there are cases of devices that in-practice have legacy drivers but
/// no specified transitional device ID. In such cases we use a device ID
/// that seems to be the consensus across existing implementations. To date
/// this means "follow QEMU's lead."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I word-smithed it a bit, but think I've captured the spirit of what you're getting at. PTAL.

Comment on lines +144 to +150
/// Maps a VirtIO Device ID to a PCI Device Class.
pub fn pci_class(self) -> u8 {
use crate::hw::pci as pci_hw;
match self {
Self::Network => pci_hw::bits::CLASS_NETWORK,
Self::Block | Self::NineP => pci_hw::bits::CLASS_STORAGE,
_ => pci_hw::bits::CLASS_UNCLASSIFIED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this just moves the class definitions from PciVirtioState::new() to one more central lookup, but there's not really an authoritative spec definition for the device classes to use, is there? looking at QEMU I see that 9P is classed a CLASS_NETWORK (o_O?) and other devices seem to be PCI_CLASS_OTHERS == 0xff where CLASS_UNCLASSIFIED has us setting the class to 0.

while I'm primarily wondering if there's a reference to note (or non-reference to lament :) ), we actually don't expect to get CLASS_UNCLASSIFIED at any point right? seems like this could take the same approach as pci_dev_id and return an Err if we're translating an unexpected DeviceId.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9P is network in QEMU? That's kind of weird; it's literally a file protocol. sigh Then again, it's designed to go over a network, so....

I think we probably do expect to see some unclassified things eventually. For instance, if we ever implement virtio console or the entropy device. But we don't have any now, so...result it is.

Comment on lines +354 to 365
let vendor_id = VENDOR_VIRTIO;
let sub_vendor_id = VENDOR_VIRTIO;
let device_id = device_type.pci_dev_id(mode).expect("Device ID");
let sub_device_id = device_type.pci_sub_dev_id();
let device_class = device_type.pci_class();
let mut builder = pci::Builder::new(pci::Ident {
vendor_id: VENDOR_VIRTIO,
device_id: dev_id,
sub_vendor_id: VENDOR_VIRTIO,
sub_device_id: sub_dev_id,
class: dev_class,
vendor_id,
device_id,
sub_vendor_id,
sub_device_id,
device_class,
..Default::default()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to have a device_type.pci_ident() that does this shuffling? then it could return a Result and pci_dev_id/pci_sub_dev_id/pci_class can be Result-ful as appropriate for their mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it, but I'm honestly kind of ambivalent about it. I don't know if it stays on the right side of the balance of finding common ground between the "generic" VirtIO stuff and the VirtIO PCI stuff; maybe the impl bits should have been moved to virtio/pci.rs. But let me know what you think.

Comment on lines +384 to +386
// Note: we don't presently support a non-zero multiplier for the
// notification register, so we don't need to size this for the
// number of queues; hence the fixed size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me not really understanding the value of a non-zero multiplier: if I'm reading correctly, the guest notifies a device by writing a virtqueue number to this register, so it's not like a write to this register notifies all queues. but with a non-zero multiplier you could do something like write to queue 1 to say that.. queue 0 has available buffers? is it just an error to notify queue N about a virtqueue other than N?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data written to the register can be more arbitrary if VIRTIO_F_NOTIF_CONFIG_DATA is negotiated, but that still doesn't need a multiplier.
If the multiplier was chosen to put each register on a separate cache line I suppose there could be a benefit there.
Qemu has a flag (VIRTIO_PCI_FLAG_PAGE_PER_VQ) that sets the multiplier to 0x1000, otherwise it uses 0x4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's also possible to support multiple queue notifications simultaneously if you use the multiplier, using multiple threads or something in the host.

Comment on lines +652 to +656
if let Some(queue) = self.queues.get(state.queue_select)
{
let mut size = queue.size.lock().unwrap();
*size = offered;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(queue) = self.queues.get(state.queue_select)
{
let mut size = queue.size.lock().unwrap();
*size = offered;
}
let Some(queue) = self.queues.get(state.queue_select) else {
// Invalid queue: drop the write.
return;
}
let mut size = queue.size.lock().unwrap();
*size = offered;

here the wrapping might end up hideouser.. but in general this helps fight the rightward drift of getting the queue and gives us a clearer place to say why the non-behavior on queue_select is reasonable.

not demanding this change here, especially because this continues the prior pattern. but if you don't get to it here I'll probably do it in a followup next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapping was worse, but I introduced a temporary to hold the queue select value and it was ok.

In general, I haven't stressed too hard on things like this because I think we ought to redo the entire mechanism.

Comment on lines +841 to +892
fn pci_cfg_cap_read(
&self,
dev: &dyn VirtioDevice,
id: &PciCfgCapReg,
op: &mut ReadOp,
) {
let _todo = dev;
match id {
PciCfgCapReg::Common(common_id) => match common_id {
CommonCfgCapReg::CfgType => {
op.write_u8(VirtioCfgCapTag::Pci as u8)
}
CommonCfgCapReg::Bar => op.write_u8(0), // TODO: Handle
CommonCfgCapReg::Offset => op.write_u32(0), // TODO: Handle
CommonCfgCapReg::Length => op.write_u32(0), // TODO: Handle
_ => self.common_cfg_cap_read(common_id, op),
},
PciCfgCapReg::PciData => {
// TODO: We actually need to handle this.
op.write_u32(0);
}
}
}

fn pci_cfg_cap_write(
&self,
dev: &dyn VirtioDevice,
id: &PciCfgCapReg,
op: &mut WriteOp,
) {
let _todo = (dev, op);
match id {
PciCfgCapReg::Common(common_id) => {
match common_id {
CommonCfgCapReg::Bar => {
// TODO: Store the bar
}
CommonCfgCapReg::Offset => {
// TODO: Store the offset
}
CommonCfgCapReg::Length => {
// TODO: Store the length
}
// Everything else is read-only for the driver.
_ => {}
}
}
PciCfgCapReg::PciData => {
// TODO: Handle the write.
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how TODO is this? it seems like leaving this unplumbed would be pretty unfortunate but clearly it's not necessary to see a working NIC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't found a driver that requires it, so it's pretty "meh, should be done... eventually." Given the time constraints, it didn't seem worth it at the moment.

Comment on lines +859 to +862
// The notification addresses (both port and MMIO) for the device can change
// due to guest action, or other administrative tasks within propolis. We
// want to update the in-kernel IO port hook only in the former case, when
// the device emulation is actually running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't move this comment out onto the function itself: it's really "why is it correct to guard hdl.set_notify_*_port on self.indicator.state() == IndicatedState::Run".

maybe leave this one where it was and leave a // Only update the kernel for the same reason as notify_port_update() in notify_mmio_addr_update? I think what you're going for is that this really describes the if in both functions, at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've word-smithed (or perhaps munged, if not out-right mangled) the comments here. PTAL.

Comment on lines +40 to +42
pub const RX_QUEUE_SIZE: u16 = 0x800;
pub const TX_QUEUE_SIZE: u16 = 0x100;
pub const CTL_QUEUE_SIZE: u16 = 32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VqSize::new is already const, is there a reason to not make these VqSize and call it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +55 to +56
/// Types and so forth for supporting the control queue.
pub mod control {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these types and definitions come from VirtIO also, right? I see what looks like the The Truth in 5.1.6.5-ish but that was after a bit of confusion thinking these would be defined by viona for the VMM to control it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants