lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28F3518B-B24A-428D-BBE2-A1981D857A4C@nutanix.com>
Date: Wed, 26 Nov 2025 16:48:32 +0000
From: Jon Kohler <jon@...anix.com>
To: Jason Wang <jasowang@...hat.com>
CC: "Michael S. Tsirkin" <mst@...hat.com>,
        Eugenio Pérez
	<eperezma@...hat.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] vhost/net: check peek_head_len after signal to
 guest to avoid delays



> On Nov 26, 2025, at 1:15 AM, Jason Wang <jasowang@...hat.com> wrote:
> 
> On Wed, Nov 26, 2025 at 1:18 AM Jon Kohler <jon@...anix.com> wrote:
>> 
>> In non-busypoll handle_rx paths, if peek_head_len returns 0, the RX
>> loop breaks, the RX wait queue is re-enabled, and vhost_net_signal_used
>> is called to flush done_idx and notify the guest if needed.
>> 
>> However, signaling the guest can take non-trivial time. During this
>> window, additional RX payloads may arrive on rx_ring without further
>> kicks. These new payloads will sit unprocessed until another kick
>> arrives, increasing latency. In high-rate UDP RX workloads, this was
>> observed to occur over 20k times per second.
>> 
>> To minimize this window and improve opportunities to process packets
>> promptly, immediately call peek_head_len after signaling. If new packets
>> are found, treat it as a busy poll interrupt and requeue handle_rx,
>> improving fairness to TX handlers and other pending CPU work. This also
>> helps suppress unnecessary thread wakeups, reducing waker CPU demand.
>> 
>> Signed-off-by: Jon Kohler <jon@...anix.com>
>> ---
>> drivers/vhost/net.c | 21 +++++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>> 
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 35ded4330431..04cb5f1dc6e4 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -1015,6 +1015,27 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
>>        struct vhost_virtqueue *tvq = &tnvq->vq;
>>        int len = peek_head_len(rnvq, sk);
>> 
>> +       if (!len && rnvq->done_idx) {
>> +               /* When idle, flush signal first, which can take some
>> +                * time for ring management and guest notification.
>> +                * Afterwards, check one last time for work, as the ring
>> +                * may have received new work during the notification
>> +                * window.
>> +                */
>> +               vhost_net_signal_used(rnvq, *count);
>> +               *count = 0;
>> +               if (peek_head_len(rnvq, sk)) {
>> +                       /* More work came in during the notification
>> +                        * window. To be fair to the TX handler and other
>> +                        * potentially pending work items, pretend like
>> +                        * this was a busy poll interruption so that
>> +                        * the RX handler will be rescheduled and try
>> +                        * again.
>> +                        */
>> +                       *busyloop_intr = true;
>> +               }
>> +       }
> 
> I'm not sure I will get here.
> 
> Once vhost_net_rx_peek_head_len() returns 0, we exit the loop to:
> 
> if (unlikely(busyloop_intr))
>                vhost_poll_queue(&vq->poll);
>        else if (!sock_len)
>                vhost_net_enable_vq(net, vq);
> out:
>        vhost_net_signal_used(nvq, count);
> 
> Are you suggesting signalling before enabling vq actually?

See my other note I just sent, yes, thats exactly what I’m suggesting

Signaling takes some time, and if we do that before we do our last
peek for work, we can pick up racing additions to the ring, and avoid
a trip to scheduler and IPIs, etc

> 
> Thanks
> 
>> +
>>        if (!len && rvq->busyloop_timeout) {
>>                /* Flush batched heads first */
>>                vhost_net_signal_used(rnvq, *count);
>> --
>> 2.43.0
>> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ