[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJaqyWePMtM8vtgm8UnGAv+_XNTnVNFSNuoqzt_Cn-CpZg46mA@mail.gmail.com>
Date: Tue, 28 Oct 2025 15:37:09 +0100
From: Eugenio Perez Martin <eperezma@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Maxime Coquelin <mcoqueli@...hat.com>, Yongji Xie <xieyongji@...edance.com>,
virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Dragos Tatulea DE <dtatulea@...dia.com>, jasowang@...hat.com
Subject: Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > >
> > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > Let me switch to MQ as I think it illustrates the point better.
> > > >
> > > > IIUC the workflow:
> > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > >
> > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > so it potentially uses the second rx queue. But, by the standard:
> > > >
> > > > The device MUST NOT queue packets on receive queues greater than
> > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > command in a used buffer.
> > > >
> > > > So the driver does not expect rx buffers on that queue at all. From
> > > > the driver's POV, the device is invalid, and it could mark it as
> > > > broken.
> > >
> > > ok intresting. Note that if userspace processes vqs it should process
> > > cvq too. I don't know what to do in this case yet, I'm going on
> > > vacation, let me ponder this a bit.
> > >
> >
> > Sure.
>
> So let me ask you this, how are you going to handle device reset?
> Same issue, it seems to me.
>
Well my proposal is to mark it as broken so it needs to be reset
manually. For example, unbinding and binding the driver in Linux. The
point is that the driver cannot trust the device anymore as it is in
an invalid state. Maybe suspend and reset all the vqs is also a valid
solution to un-broke it if the device supports it but I think a race
is unavoidable there, and I'm not sure how to communicate it to
userspace for all kinds of devices. Incrementing rx errors could be a
first proposal.
If we want to track it in VDUSE we should implement NEEDS_RESET and
leave all the old drivers without solution. That's why I think it is
better to solve all the problems at once in the driver.
Powered by blists - more mailing lists