linux-kernel - Re: [PATCH] virtio_ring: Fix the stale index in available ring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20240327075049-mutt-send-email-mst@kernel.org>
Date: Wed, 27 Mar 2024 07:56:58 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Will Deacon <will@...nel.org>
Cc: Keir Fraser <keirf@...gle.com>, gshan@...hat.com,
	virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
	jasowang@...hat.com, xuanzhuo@...ux.alibaba.com, yihyu@...hat.com,
	shan.gavin@...il.com, linux-arm-kernel@...ts.infradead.org,
	Catalin Marinas <catalin.marinas@....com>, mochs@...dia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring

On Tue, Mar 26, 2024 at 03:46:29PM +0000, Will Deacon wrote:
> On Tue, Mar 26, 2024 at 11:43:13AM +0000, Will Deacon wrote:
> > On Tue, Mar 26, 2024 at 09:38:55AM +0000, Keir Fraser wrote:
> > > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > > Secondly, the debugging code is enhanced so that the available head for
> > > > > (last_avail_idx - 1) is read for twice and recorded. It means the available
> > > > > head for one specific available index is read for twice. I do see the
> > > > > available heads are different from the consecutive reads. More details
> > > > > are shared as below.
> > > > > 
> > > > > From the guest side
> > > > > ===================
> > > > > 
> > > > > virtio_net virtio0: output.0:id 86 is not a head!
> > > > > head to be released: 047 062 112
> > > > > 
> > > > > avail_idx:
> > > > > 000  49665
> > > > > 001  49666  <--
> > > > >  :
> > > > > 015  49664
> > > > 
> > > > what are these #s 49665 and so on?
> > > > and how large is the ring?
> > > > I am guessing 49664 is the index ring size is 16 and
> > > > 49664 % 16 == 0
> > > 
> > > More than that, 49664 % 256 == 0
> > > 
> > > So again there seems to be an error in the vicinity of roll-over of
> > > the idx low byte, as I observed in the earlier log. Surely this is
> > > more than coincidence?
> > 
> > Yeah, I'd still really like to see the disassembly for both sides of the
> > protocol here. Gavin, is that something you're able to provide? Worst
> > case, the host and guest vmlinux objects would be a starting point.
> > 
> > Personally, I'd be fairly surprised if this was a hardware issue.
> 
> Ok, long shot after eyeballing the vhost code, but does the diff below
> help at all? It looks like vhost_vq_avail_empty() can advance the value
> saved in 'vq->avail_idx' but without the read barrier, possibly confusing
> vhost_get_vq_desc() in polling mode.
> 
> Will
> 
> --->8
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 045f666b4f12..87bff710331a 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2801,6 +2801,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>                 return false;
>         vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>  
> +       smp_rmb();
>         return vq->avail_idx == vq->last_avail_idx;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

Oh wow you are right.

We have:

        if (vq->avail_idx == vq->last_avail_idx) {
                if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
                        vq_err(vq, "Failed to access avail idx at %p\n",
                                &vq->avail->idx);
                        return -EFAULT;
                }
                vq->avail_idx = vhost16_to_cpu(vq, avail_idx);

                if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
                        vq_err(vq, "Guest moved used index from %u to %u",
                                last_avail_idx, vq->avail_idx);
                        return -EFAULT;
                }

                /* If there's nothing new since last we looked, return
                 * invalid.
                 */
                if (vq->avail_idx == last_avail_idx)
                        return vq->num;

                /* Only get avail ring entries after they have been
                 * exposed by guest.
                 */
                smp_rmb();
        }


and so the rmb only happens if avail_idx is not advanced.

Actually there is a bunch of code duplication where we assign to
avail_idx, too.

Will thanks a lot for looking into this! I kept looking into
the virtio side for some reason, the fact that it did not
trigger with qemu should have been a big hint!


-- 
MST