Add HIP 0001 ring buffer proposal for Hyperlight I/O #1112

andreiltd · 2025-12-12T14:08:09Z

jsturtevant · 2025-12-16T20:58:39Z

proposals/0001-rng-buf/README.md

+
+### Risks and Mitigations
+
+**Risk**: Malicious guest corrupting the queue **Mitigation**: Do not expose low level queue API to


I don't know that not exposing it to the guest does anything. The guest is fully untrusted and so we need to do the mitigation suggested after this, that we serialize all data as known types and do assertions as suggested.

proposals/0001-rng-buf/README.md

jsturtevant · 2025-12-16T21:06:17Z

proposals/0001-rng-buf/README.md

+example `ioeventfd` for kvm). This is especially useful for streaming scenarios where the guest can
+continue processing while the host consumes data asynchronously.
+
+**3. Inline Descriptors**


Is this proposed to be implemented now or as possible improvement in the future

I would like to give it a try in the first draft, but really if we're gonna keep it or not depends on two factors: 1) can we actually fit any meaningful data into inline slot, e.g. can serialized function call with single argument fit? 2) the performance improvement for such calls is substantial. Both are hard to assess without actual implementation.

jsturtevant · 2025-12-16T21:08:33Z

proposals/0001-rng-buf/README.md

+2. Device writes up to buffer length
+3. Device sets actual written length in descriptor
+4. If `actual_length > buffer_length`, device sets a `TRUNCATED` flag,
+5. Driver can re-submit with larger buffer if needed


I can image this would be possible performance issue? Can we provide a metric for when this occurs and a way to improve initial buffer size?

That's a good point. I would imagine that this only happens when you try to transport a "big" chunk of data for which the base case should be using stream rather then pushing everything at once. That being said the protocol should support full round trip but really the mitigation of this problem should be using streams.

proposals/0001-rng-buf/README.md

jsturtevant · 2025-12-16T21:10:12Z

proposals/0001-rng-buf/README.md

+### Snapshotting
+
+Snapshotting requires that the descriptor table has no in-flight guest-to-host requests and any
+attempt to snapshot a sandbox with such pending requests will result in a snapshot failure.


Can we ensure this?

Yes, we can query the queue for number of inflight messages during snapshotting and fail if it reports positive number. Because we have to track the number of free slots in the queue getting number of inflight messages is just length of the queue minus free slots.

proposals/0001-rng-buf/README.md

jsturtevant

This looks great, thanks for such a detailed write up. Left a few minor comments but otherwise this seems like it would set up the project to work well for the future

Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>

Copilot

Pull request overview

This pull request adds HIP 0001, a comprehensive proposal document for implementing a virtio-inspired ring buffer mechanism for Hyperlight I/O. The proposal aims to replace the current stack-based communication model with a more efficient ring buffer approach to reduce VM exits and enable streaming communication patterns.

Changes:

Adds a detailed technical proposal document describing the ring buffer design, API, and implementation plan
Updates typos.toml to allow "readables" and "writables" as valid terms used in the proposal

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File	Description
proposals/0001-rng-buf/README.md	Comprehensive HIP document describing virtio-inspired ring buffer design with memory layout, API examples, and implementation plan
typos.toml	Adds exceptions for "readables" and "writables" terms used in the proposal document

proposals/0001-rng-buf/README.md

danbugs · 2026-01-14T02:05:09Z

Really great work on this HIP, @andreiltd! Loved seeing the diagrams, extensive testing plan, code, and whatnot.

A few thoughts:

1. Side-by-side comparison suggestion

I think the HIP could benefit from a more explicit side-by-side comparison in the "Comparison with current implementation" section. Something like a table showing:

Current: push_shared_output_data()-> outb -> VM Exit -> pop_shared_input_data() (per call)
Proposed: submit_available() × N -> notify() → VM Exit → poll_used() × N (batched)

I feel like this would help readers quickly grasp the key difference in the two approaches.

2. Related work in Nanvix fork

I've done some similar work in the Nanvix hyperlight fork that might be relevant: 76c9e7c

There, I added a credits_value field to HyperlightPEB that the guest can check before performing a VMExit. This way, the guest avoids unnecessary exits when the host signals no resources are available. This is similar to ring buffer notification suppression / checking num_free() before submitting, so I feel like Hyperlight-Nanvix could really benefit from this work!

3. Single-threaded vs multi-threaded phasing

For the multi-threaded case (guest and host on separate threads), the host would need a way to interrupt the running guest when responses are ready--yes? For that, I've also done related work for that in Nanvix's Hyperlight fork: a897cad for KVM.

My hardware interrupts work enables PIC/APIC interrupt injection which could support the ioeventfd-style notifications mentioned in the HIP. Happy to coordinate on this when the time comes :)

The backward compatibility goal (current function call model portable without public API changes) is really cool to see--great for downstream consumers like Nanvix.

Looking forward to seeing this land!

Fix typos Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Tomasz Andrzejak <andreiltd@users.noreply.github.com>

Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>

syntactically

I mostly really like this! Left a few minor nits, and a couple of bigger comments clarifying the question of where buffers get allocated.

I didn't review the Rust type design in detail, as I understand that is going to be updated shortly.

syntactically · 2026-01-22T14:08:20Z

proposals/0001-rng-buf/README.md

+
+### Risks and Mitigations
+
+**Risk**: Malicious guest corrupting the queue **Mitigation**: Do not expose low level queue API to


I agree with @jsturtevant here---the raw state of the queue is fundamentally exposed to the guest. The thing that we need to do on the host for safety is figure out what that means for us: making sure that the buffers are always in guest memory, ensuring that we do not access them in racy ways (even if the guest tries to trick us into having the same buffer both read from and written to at the same time), etc.

syntactically · 2026-01-22T14:08:20Z

proposals/0001-rng-buf/README.md

+## Design Details
+
+### Packed Virtqueue Overview
+


I don't love the producer/consumer terminology here. I think it added to the confusion before about how we should have buffer allocation responsibility be divided. How about something like "the driver is the side that can allocate buffers accessible by both sides (i.e. the guest), and the device is the side that reads from/writes to these buffers on request (i.e. the host). The host is essentially providing a specialised paravirtualised device to the guest, and so always takes on the device role.

syntactically · 2026-01-22T14:08:20Z

proposals/0001-rng-buf/README.md

+can act as driver or device depending on the direction of communication.
+
+#### Bidirectional Communication
+


How about something like:

Hyperlight needs to support both device-initiated (for host->guest function calls) and driver-initiated (for guest->host function calls) communication. This is generally accomplished by having the guest create during early initialisation and keep populated a set of available buffers into which the host should next write incoming work for the guest.

I am not fundamentally opposed to having split rx/tx queues, like e.g. virtio-net, but unsure if they actually make sense here, at least in the single-threaded case?

syntactically · 2026-01-22T14:08:20Z

proposals/0001-rng-buf/README.md

+
+#### Memory Layout
+
+The queue structure reside in shared memory and are accessible to both host and guest. The layout


Should this be an offset from the mapped region base, or a GPA? I would tend to think GPA, even though in most cases the host will want to ensure that it is in the scratch region and then convert to an offset from there.

syntactically · 2026-01-22T14:08:20Z

proposals/0001-rng-buf/README.md

+
+<img width="1324" height="726" alt="submit" src="https://github.com/user-attachments/assets/a3ee2dea-a55b-4d50-8b96-1702617a21f0" />
+
+**Step 2:** Device processes and writes response


I assume that in this picture there is implicitly some header inside the buffers passed where some kind of call ID allocated by the caller can be used to link up calls/responses?

syntactically · 2026-01-22T14:08:20Z

proposals/0001-rng-buf/README.md

+descriptor layer or in the flatbuffer schema:
+
+1. Driver allocates buffer of estimated size
+2. Device writes up to buffer length


Is this flag set in the queue flags, or in the header protocol inside of the buffers? The latter I assume needs to keep track of both a request id and an offset from the beginning of the response to that request for this buffer, for this to work?

syntactically · 2026-01-22T14:08:20Z

proposals/0001-rng-buf/README.md

+
+### Snapshotting
+
+Snapshotting requires that the descriptor table has no in-flight guest-to-host requests and any


I guess there is a slightly more complex issue with snapshotting as well: the state of the buffer pool in the driver will need to be considered/updated, because (a) the physical addresses of the inflight "write the next request for me here" buffers will need to be updated to the new scratch region and (b) something will need to be done (perhaps the same update? but slightly unclear, in case the guest is currently using data in those buffers) with the buffers in the allocator pool that are not currently available in the queue.

syntactically

Sorry, some kind of glitch seems to have resulted in my comments getting posted twice. I can't delete this duplicate review comment, apparently.

syntactically

I have some minor comments on the Rust definitions as well.

syntactically · 2026-01-22T14:26:32Z

proposals/0001-rng-buf/README.md

+This mechanism ensures that no locks are required for synchronization, only memory barriers combined
+with atomic publishing of flags ensure that the other side will never observe a partial update:
+
+- Driver: Write descriptor fields ? memory barrier ? atomic Release-store flags


What barrier were you thinking we might need here? I would think that the synchronising release store and acquire load should always be enough.

Separately-ish, there is the issue of how to ensure that the host does not have data race UB, even in the face of a guest that maliciously violates the synchronisation protocol. This is all a bit complicated by the fact that the reads/writes on the other side are outside the Rust abstract machine. We have some notes about this in shared_mem.rs, which were written for the future thinking about this kind of concurrent update situation. There was also a recentish discussion in unsafe-code-guidelines about this topic, which came to a similar conclusion to our notes: what we really want is a relaxed atomic-per-byte memcpy to get data in/out of the guest (after the acquire atomic load), but unfortunately this operation does not exist in Rust yet. So, I think that the present approach of having an acquire load and then volatile non-atomic reads/writes is still the "least bad" option, even though it is not exactly correct w.r.t. the memory model---hopefully we can eventually switch to the per-byte atomic memcpy.

syntactically · 2026-01-22T14:28:47Z

proposals/0001-rng-buf/README.md

+    pub struct DescFlags: u16 {
+        const NEXT     = 1 << 0;   // Buffer continues via next descriptor
+        const WRITE    = 1 << 1;   // Device write-only (otherwise read-only)
+        const INDIRECT = 1 << 2;   // Buffer contains list of descriptors


The text mentioned not supporting indirect, at least for now---do we want this flag to be allocated even though we aren't supporting it yet?

syntactically · 2026-01-22T14:31:10Z

proposals/0001-rng-buf/README.md

+/// Event suppression structure for controlling notifications.
+/// Both sides can control when they want to be notified.
+#[repr(C)]
+pub struct EventSuppression {


Where do these end up?

syntactically · 2026-01-22T14:32:49Z

proposals/0001-rng-buf/README.md

+    desc_table: DescTable,         // Descriptor table in shared memory
+    id_free: SmallVec<[u16; 256]>, // Stack of free IDs (allows out-of-order completion)
+    id_num: SmallVec<[u16; 256]>,  // Chain length per ID
+    drv_evt_addr: u64,             // Controls when device notifies about used buffers


What does _addr mean here? Are these meant to be bitpacked versions of the suppression structure above?

syntactically · 2026-01-22T14:35:41Z

proposals/0001-rng-buf/README.md

+    /// Length in bytes (for used: bytes written by device)
+    pub len: u32,
+    /// Buffer ID for correlating completions with submissions
+    pub id: u16,


Are these meant to identify an individual buffer, or are they the higher-level semantic protocol thing used to correlate e.g. calls and returns? If the former, do they gain us anything on just using the address as the buffer ID?

syntactically · 2026-01-22T14:36:25Z

proposals/0001-rng-buf/README.md

+
+impl<M: MemOps> RingProducer<M> {
+    /// Submit a buffer chain to the ring.
+    /// Returns the descriptor ID assigned to this chain.


Why do we need this as well as the _notify variant?

syntactically · 2026-01-22T14:37:44Z

proposals/0001-rng-buf/README.md

+    /// Submit with notification check based on device's event suppression settings.
+    pub fn submit_available_with_notify(&mut self, chain: &BufferChain) -> Result<SubmitResult, RingError>;
+
+    /// Poll for a used buffer. Returns WouldBlock if no completions available.


We may need to think a little bit about whether there is a way to actually block/return control to the host, and whether that should be integrated into or distinct from the host->guest notification mechanism, which, at least right now, will also exit to the host in the calling thread (but perhaps in the future could be some faster IPI?)

syntactically · 2026-01-22T14:38:18Z

proposals/0001-rng-buf/README.md

+
+#### Buffer Chain Builder
+
+Type-state pattern enforces that readable buffers are added before writable buffers, preventing


Why do we care about this "readable before writable" invariant?

syntactically · 2026-01-22T14:40:35Z

proposals/0001-rng-buf/README.md

+    inner: RingProducer<M>,
+    notifier: N,
+    pool: P,
+    inflight: Vec<Option<ProducerInflight>>,


What's this? I don't see the ProducerInflight definition here

syntactically · 2026-01-22T14:43:22Z

proposals/0001-rng-buf/README.md

+
+```rust
+/// Trait for buffer providers.
+pub trait BufferProvider {


In case these eventually end up passed around through a lot of parsing code with indeterminate lifetimes, would it be useful for alloc/dealloc here to be more like get/put and keep a refcount for the buffer and return it to the free pool when it's unreferenced? I have no idea of this is true, just thought it was worth throwing out there.

andreiltd requested review from danbugs, dblnz, devigned, jprendes, ludfjig, marosset, simongdavies and syntactically as code owners December 12, 2025 14:08

andreiltd added area/documentation Related to documentation kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. labels Dec 12, 2025

jsturtevant reviewed Dec 16, 2025

View reviewed changes

proposals/0001-rng-buf/README.md Show resolved Hide resolved

jsturtevant reviewed Dec 16, 2025

View reviewed changes

proposals/0001-rng-buf/README.md Outdated Show resolved Hide resolved

jsturtevant reviewed Dec 16, 2025

View reviewed changes

proposals/0001-rng-buf/README.md Outdated Show resolved Hide resolved

jsturtevant reviewed Dec 16, 2025

View reviewed changes

proposals/0001-rng-buf/README.md Show resolved Hide resolved

jsturtevant reviewed Dec 16, 2025

View reviewed changes

proposals/0001-rng-buf/README.md Show resolved Hide resolved

jsturtevant previously approved these changes Dec 16, 2025

View reviewed changes

andreiltd dismissed jsturtevant’s stale review via 2318367 December 18, 2025 09:04

Add HIP 0001 ring buffer proposal for Hyperlight I/O

25f5d22

Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>

andreiltd force-pushed the tandr/rng-rfc branch from 2318367 to 25f5d22 Compare December 18, 2025 09:09

Merge branch 'main' into tandr/rng-rfc

6f5c265

danbugs requested a review from Copilot January 14, 2026 01:34

Copilot started reviewing on behalf of danbugs January 14, 2026 01:35 View session

Copilot AI reviewed Jan 14, 2026

View reviewed changes

andreiltd and others added 2 commits January 19, 2026 14:17

Apply suggestions from code review

7e0a191

Fix typos Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Tomasz Andrzejak <andreiltd@users.noreply.github.com>

Update type system design for virtqueue implementation

934dee7

Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>

andreiltd force-pushed the tandr/rng-rfc branch from a2a24cf to 934dee7 Compare January 22, 2026 13:44

syntactically reviewed Jan 22, 2026

View reviewed changes


		### Risks and Mitigations

		Risk: Malicious guest corrupting the queue Mitigation: Do not expose low level queue API to

		can act as driver or device depending on the direction of communication.

		#### Bidirectional Communication


		#### Memory Layout

		The queue structure reside in shared memory and are accessible to both host and guest. The layout


		<img width="1324" height="726" alt="submit" src="https://github.com/user-attachments/assets/a3ee2dea-a55b-4d50-8b96-1702617a21f0" />

		Step 2: Device processes and writes response


		### Snapshotting

		Snapshotting requires that the descriptor table has no in-flight guest-to-host requests and any


		#### Buffer Chain Builder

		Type-state pattern enforces that readable buffers are added before writable buffers, preventing

Add HIP 0001 ring buffer proposal for Hyperlight I/O #1112

Are you sure you want to change the base?

Add HIP 0001 ring buffer proposal for Hyperlight I/O #1112

Uh oh!

Conversation

andreiltd commented Dec 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsturtevant left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danbugs commented Jan 14, 2026

Uh oh!

syntactically left a comment

Choose a reason for hiding this comment

Uh oh!

syntactically Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

syntactically left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

syntactically left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

syntactically Jan 22, 2026 •

edited

Loading

syntactically left a comment •

edited

Loading