[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIHydjBEnmkTt-P-@willie-the-truck>
Date: Thu, 24 Jul 2025 09:44:38 +0100
From: Will Deacon <will@...nel.org>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Stefano Garzarella <sgarzare@...hat.com>,
Breno Leitao <leitao@...ian.org>, jasowang@...hat.com,
eperezma@...hat.com, linux-arm-kernel@...ts.infradead.org,
kvm@...r.kernel.org, Stefan Hajnoczi <stefanha@...hat.com>,
netdev@...r.kernel.org
Subject: Re: vhost: linux-next: crash at vhost_dev_cleanup()
On Thu, Jul 24, 2025 at 04:22:15AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 24, 2025 at 10:14:36AM +0200, Stefano Garzarella wrote:
> > CCing Will
Thanks.
> > On Thu, 24 Jul 2025 at 09:48, Michael S. Tsirkin <mst@...hat.com> wrote:
> > >
> > > On Wed, Jul 23, 2025 at 08:04:42AM -0700, Breno Leitao wrote:
> > > > Hello,
> > > >
> > > > I've seen a crash in linux-next for a while on my arm64 server, and
> > > > I decided to report.
> > > >
> > > > While running stress-ng on linux-next, I see the crash below.
> > > >
> > > > This is happening in a kernel configure with some debug options (KASAN,
> > > > LOCKDEP and KMEMLEAK).
> > > >
> > > > Basically running stress-ng in a loop would crash the host in 15-20
> > > > minutes:
> > > > # while (true); do stress-ng -r 10 -t 10; done
> > > >
> > > > >From the early warning "virt_to_phys used for non-linear address",
> >
> > mmm, we recently added nonlinear SKBs support in vhost-vsock [1],
> > @Will can this issue be related?
>
> Good point.
>
> Breno, if bisecting is too much trouble, would you mind testing the commits
> c76f3c4364fe523cd2782269eab92529c86217aa
> and
> c7991b44d7b44f9270dec63acd0b2965d29aab43
> and telling us if this reproduces?
That's definitely worth doing, but we should be careful not to confuse
the "non-linear address" from the warning (which refers to virtual
addresses that lie outside of the linear mapping of memory, e.g. in the
vmalloc space) and "non-linear SKBs" which refer to SKBs with fragment
pages.
Breno -- when you say you've been seeing this "for a while", what's the
earliest kernel you know you saw it on?
> > > > I suppose corrupted data is at vq->nheads.
> > > >
> > > > Here is the decoded stack against 9798752 ("Add linux-next specific
> > > > files for 20250721")
> > > >
> > > >
> > > > [ 620.685144] [ T250731] VFIO - User Level meta-driver version: 0.3
> > > > [ 622.394448] [ T250254] ------------[ cut here ]------------
> > > > [ 622.413492] [ T250254] virt_to_phys used for non-linear address: 000000006e69fe64 (0xcfcecdcccbcac9c8)
So here's the bad (non-linear) pointer. Do you know if 0xcfcecdcccbcac9c8
correlates with the packet data that stress-ng is generating? I wonder if
we're somehow overflowing vq->iov[].
Will
Powered by blists - more mailing lists