lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240326154628.GA9613@willie-the-truck>
Date: Tue, 26 Mar 2024 15:46:29 +0000
From: Will Deacon <will@...nel.org>
To: Keir Fraser <keirf@...gle.com>, gshan@...hat.com
Cc: "Michael S. Tsirkin" <mst@...hat.com>, virtualization@...ts.linux.dev,
	linux-kernel@...r.kernel.org, jasowang@...hat.com,
	xuanzhuo@...ux.alibaba.com, yihyu@...hat.com, shan.gavin@...il.com,
	linux-arm-kernel@...ts.infradead.org,
	Catalin Marinas <catalin.marinas@....com>, mochs@...dia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring

On Tue, Mar 26, 2024 at 11:43:13AM +0000, Will Deacon wrote:
> On Tue, Mar 26, 2024 at 09:38:55AM +0000, Keir Fraser wrote:
> > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > Secondly, the debugging code is enhanced so that the available head for
> > > > (last_avail_idx - 1) is read for twice and recorded. It means the available
> > > > head for one specific available index is read for twice. I do see the
> > > > available heads are different from the consecutive reads. More details
> > > > are shared as below.
> > > > 
> > > > From the guest side
> > > > ===================
> > > > 
> > > > virtio_net virtio0: output.0:id 86 is not a head!
> > > > head to be released: 047 062 112
> > > > 
> > > > avail_idx:
> > > > 000  49665
> > > > 001  49666  <--
> > > >  :
> > > > 015  49664
> > > 
> > > what are these #s 49665 and so on?
> > > and how large is the ring?
> > > I am guessing 49664 is the index ring size is 16 and
> > > 49664 % 16 == 0
> > 
> > More than that, 49664 % 256 == 0
> > 
> > So again there seems to be an error in the vicinity of roll-over of
> > the idx low byte, as I observed in the earlier log. Surely this is
> > more than coincidence?
> 
> Yeah, I'd still really like to see the disassembly for both sides of the
> protocol here. Gavin, is that something you're able to provide? Worst
> case, the host and guest vmlinux objects would be a starting point.
> 
> Personally, I'd be fairly surprised if this was a hardware issue.

Ok, long shot after eyeballing the vhost code, but does the diff below
help at all? It looks like vhost_vq_avail_empty() can advance the value
saved in 'vq->avail_idx' but without the read barrier, possibly confusing
vhost_get_vq_desc() in polling mode.

Will

--->8

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..87bff710331a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2801,6 +2801,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
                return false;
        vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
 
+       smp_rmb();
        return vq->avail_idx == vq->last_avail_idx;
 }
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ