lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240319033016-mutt-send-email-mst@kernel.org>
Date: Tue, 19 Mar 2024 03:36:31 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Will Deacon <will@...nel.org>
Cc: Gavin Shan <gshan@...hat.com>, virtualization@...ts.linux.dev,
	linux-kernel@...r.kernel.org, jasowang@...hat.com,
	xuanzhuo@...ux.alibaba.com, yihyu@...hat.com, shan.gavin@...il.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring

On Mon, Mar 18, 2024 at 04:59:24PM +0000, Will Deacon wrote:
> On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote:
> > The issue is reported by Yihuang Yu who have 'netperf' test on
> > NVidia's grace-grace and grace-hopper machines. The 'netperf'
> > client is started in the VM hosted by grace-hopper machine,
> > while the 'netperf' server is running on grace-grace machine.
> > 
> > The VM is started with virtio-net and vhost has been enabled.
> > We observe a error message spew from VM and then soft-lockup
> > report. The error message indicates the data associated with
> > the descriptor (index: 135) has been released, and the queue
> > is marked as broken. It eventually leads to the endless effort
> > to fetch free buffer (skb) in drivers/net/virtio_net.c::start_xmit()
> > and soft-lockup. The stale index 135 is fetched from the available
> > ring and published to the used ring by vhost, meaning we have
> > disordred write to the available ring element and available index.
> > 
> >   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64              \
> >   -accel kvm -machine virt,gic-version=host                            \
> >      :                                                                 \
> >   -netdev tap,id=vnet0,vhost=on                                        \
> >   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 \
> > 
> >   [   19.993158] virtio_net virtio1: output.0:id 135 is not a head!
> > 
> > Fix the issue by replacing virtio_wmb(vq->weak_barriers) with stronger
> > virtio_mb(false), equivalent to replaced 'dmb' by 'dsb' instruction on
> > ARM64. It should work for other architectures, but performance loss is
> > expected.
> > 
> > Cc: stable@...r.kernel.org
> > Reported-by: Yihuang Yu <yihyu@...hat.com>
> > Signed-off-by: Gavin Shan <gshan@...hat.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 49299b1f9ec7..7d852811c912 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -687,9 +687,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >  	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> >  	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> >  
> > -	/* Descriptors and available array need to be set before we expose the
> > -	 * new available array entries. */
> > -	virtio_wmb(vq->weak_barriers);
> > +	/*
> > +	 * Descriptors and available array need to be set before we expose
> > +	 * the new available array entries. virtio_wmb() should be enough
> > +	 * to ensuere the order theoretically. However, a stronger barrier
> > +	 * is needed by ARM64. Otherwise, the stale data can be observed
> > +	 * by the host (vhost). A stronger barrier should work for other
> > +	 * architectures, but performance loss is expected.
> > +	 */
> > +	virtio_mb(false);
> >  	vq->split.avail_idx_shadow++;
> >  	vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> >  						vq->split.avail_idx_shadow);
> 
> Replacing a DMB with a DSB is _very_ unlikely to be the correct solution
> here, especially when ordering accesses to coherent memory.
> 
> In practice, either the larger timing different from the DSB or the fact
> that you're going from a Store->Store barrier to a full barrier is what
> makes things "work" for you. Have you tried, for example, a DMB SY
> (e.g. via __smb_mb()).
> 
> We definitely shouldn't take changes like this without a proper
> explanation of what is going on.
> 
> Will

Just making sure: so on this system, how do
smp_wmb() and wmb() differ? smb_wmb is normally for synchronizing
with kernel running on another CPU and we are doing something
unusual in virtio when we use it to synchronize with host
as opposed to the guest - e.g. CONFIG_SMP is special cased
because of this:

#define virt_wmb() do { kcsan_wmb(); __smp_wmb(); } while (0)

Note __smp_wmb not smp_wmb which would be a NOP on UP.


-- 
MST


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ