[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZgKXr8IQ51xBECuP@google.com>
Date: Tue, 26 Mar 2024 09:38:55 +0000
From: Keir Fraser <keirf@...gle.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Gavin Shan <gshan@...hat.com>, Will Deacon <will@...nel.org>,
virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
jasowang@...hat.com, xuanzhuo@...ux.alibaba.com, yihyu@...hat.com,
shan.gavin@...il.com, linux-arm-kernel@...ts.infradead.org,
Catalin Marinas <catalin.marinas@....com>, mochs@...dia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring
On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote:
> On Mon, Mar 25, 2024 at 05:34:29PM +1000, Gavin Shan wrote:
> >
> > On 3/20/24 17:14, Michael S. Tsirkin wrote:
> > > On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote:
> > > > On 3/20/24 10:49, Michael S. Tsirkin wrote:>
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 6f7e5010a673..79456706d0bd 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -685,7 +685,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > /* Put entry in available array (but don't update avail->idx until they
> > > > > * do sync). */
> > > > > avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > > > > - vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > > > > + u16 headwithflag = head | (q->split.avail_idx_shadow & ~(vq->split.vring.num - 1));
> > > > > + vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, headwithflag);
> > > > > /* Descriptors and available array need to be set before we expose the
> > > > > * new available array entries. */
> > > > >
> >
> > Ok, Michael. I continued with my debugging code. It still looks like a
> > hardware bug on NVidia's grace-hopper. I really think NVidia needs to be
> > involved for the discussion, as suggested by you.
>
> Do you have a support contact at Nvidia to report this?
>
> > Firstly, I bind the vhost process and vCPU thread to CPU#71 and CPU#70.
> > Note that I have only one vCPU in my configuration.
>
> Interesting but is guest built with CONFIG_SMP set?
arm64 is always built CONFIG_SMP.
> > Secondly, the debugging code is enhanced so that the available head for
> > (last_avail_idx - 1) is read for twice and recorded. It means the available
> > head for one specific available index is read for twice. I do see the
> > available heads are different from the consecutive reads. More details
> > are shared as below.
> >
> > From the guest side
> > ===================
> >
> > virtio_net virtio0: output.0:id 86 is not a head!
> > head to be released: 047 062 112
> >
> > avail_idx:
> > 000 49665
> > 001 49666 <--
> > :
> > 015 49664
>
> what are these #s 49665 and so on?
> and how large is the ring?
> I am guessing 49664 is the index ring size is 16 and
> 49664 % 16 == 0
More than that, 49664 % 256 == 0
So again there seems to be an error in the vicinity of roll-over of
the idx low byte, as I observed in the earlier log. Surely this is
more than coincidence?
-- Keir
> > avail_head:
>
>
> is this the avail ring contents?
>
> > 000 062
> > 001 047 <--
> > :
> > 015 112
>
>
> What are these arrows pointing at, btw?
>
>
> > From the host side
> > ==================
> >
> > avail_idx
> > 000 49663
> > 001 49666 <---
> > :
> >
> > avail_head
> > 000 062 (062)
> > 001 047 (047) <---
> > :
> > 015 086 (112) // head 086 is returned from the first read,
> > // but head 112 is returned from the second read
> >
> > vhost_get_vq_desc: Inconsistent head in two read (86 -> 112) for avail_idx 49664
> >
> > Thanks,
> > Gavin
>
> OK thanks so this proves it is actually the avail ring value.
>
> --
> MST
>
Powered by blists - more mailing lists