lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJaqyWePMtM8vtgm8UnGAv+_XNTnVNFSNuoqzt_Cn-CpZg46mA@mail.gmail.com>
Date: Tue, 28 Oct 2025 15:37:09 +0100
From: Eugenio Perez Martin <eperezma@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Maxime Coquelin <mcoqueli@...hat.com>, Yongji Xie <xieyongji@...edance.com>, 
	virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org, 
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Dragos Tatulea DE <dtatulea@...dia.com>, jasowang@...hat.com
Subject: Re: [RFC 1/2] virtio_net: timeout control virtqueue commands

On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > >
> > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > Let me switch to MQ as I think it illustrates the point better.
> > > >
> > > > IIUC the workflow:
> > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > >
> > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > so it potentially uses the second rx queue. But, by the standard:
> > > >
> > > > The device MUST NOT queue packets on receive queues greater than
> > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > command in a used buffer.
> > > >
> > > > So the driver does not expect rx buffers on that queue at all. From
> > > > the driver's POV, the device is invalid, and it could mark it as
> > > > broken.
> > >
> > > ok intresting. Note that if userspace processes vqs it should process
> > > cvq too. I don't know what to do in this case yet, I'm going on
> > > vacation, let me ponder this a bit.
> > >
> >
> > Sure.
>
> So let me ask you this, how are you going to handle device reset?
> Same issue, it seems to me.
>

Well my proposal is to mark it as broken so it needs to be reset
manually. For example, unbinding and binding the driver in Linux. The
point is that the driver cannot trust the device anymore as it is in
an invalid state. Maybe suspend and reset all the vqs is also a valid
solution to un-broke it if the device supports it but I think a race
is unavoidable there, and I'm not sure how to communicate it to
userspace for all kinds of devices. Incrementing rx errors could be a
first proposal.

If we want to track it in VDUSE we should implement NEEDS_RESET and
leave all the old drivers without solution. That's why I think it is
better to solve all the problems at once in the driver.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ