[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240326154628.GA9613@willie-the-truck>
Date: Tue, 26 Mar 2024 15:46:29 +0000
From: Will Deacon <will@...nel.org>
To: Keir Fraser <keirf@...gle.com>, gshan@...hat.com
Cc: "Michael S. Tsirkin" <mst@...hat.com>, virtualization@...ts.linux.dev,
linux-kernel@...r.kernel.org, jasowang@...hat.com,
xuanzhuo@...ux.alibaba.com, yihyu@...hat.com, shan.gavin@...il.com,
linux-arm-kernel@...ts.infradead.org,
Catalin Marinas <catalin.marinas@....com>, mochs@...dia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring
On Tue, Mar 26, 2024 at 11:43:13AM +0000, Will Deacon wrote:
> On Tue, Mar 26, 2024 at 09:38:55AM +0000, Keir Fraser wrote:
> > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > Secondly, the debugging code is enhanced so that the available head for
> > > > (last_avail_idx - 1) is read for twice and recorded. It means the available
> > > > head for one specific available index is read for twice. I do see the
> > > > available heads are different from the consecutive reads. More details
> > > > are shared as below.
> > > >
> > > > From the guest side
> > > > ===================
> > > >
> > > > virtio_net virtio0: output.0:id 86 is not a head!
> > > > head to be released: 047 062 112
> > > >
> > > > avail_idx:
> > > > 000 49665
> > > > 001 49666 <--
> > > > :
> > > > 015 49664
> > >
> > > what are these #s 49665 and so on?
> > > and how large is the ring?
> > > I am guessing 49664 is the index ring size is 16 and
> > > 49664 % 16 == 0
> >
> > More than that, 49664 % 256 == 0
> >
> > So again there seems to be an error in the vicinity of roll-over of
> > the idx low byte, as I observed in the earlier log. Surely this is
> > more than coincidence?
>
> Yeah, I'd still really like to see the disassembly for both sides of the
> protocol here. Gavin, is that something you're able to provide? Worst
> case, the host and guest vmlinux objects would be a starting point.
>
> Personally, I'd be fairly surprised if this was a hardware issue.
Ok, long shot after eyeballing the vhost code, but does the diff below
help at all? It looks like vhost_vq_avail_empty() can advance the value
saved in 'vq->avail_idx' but without the read barrier, possibly confusing
vhost_get_vq_desc() in polling mode.
Will
--->8
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..87bff710331a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2801,6 +2801,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
return false;
vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+ smp_rmb();
return vq->avail_idx == vq->last_avail_idx;
}
EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
Powered by blists - more mailing lists