[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220815205330.m54g7vcs77r6owd6@awork3.anarazel.de>
Date: Mon, 15 Aug 2022 13:53:30 -0700
From: Andres Freund <andres@...razel.de>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Jason Wang <jasowang@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jens Axboe <axboe@...nel.dk>,
James Bottomley <James.Bottomley@...senpartnership.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Guenter Roeck <linux@...ck-us.net>,
linux-kernel@...r.kernel.org, Greg KH <gregkh@...uxfoundation.org>,
c@...hat.com
Subject: Re: upstream kernel crashes
Hi,
On 2022-08-15 16:21:51 -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 15, 2022 at 10:46:17AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2022-08-15 12:50:52 -0400, Michael S. Tsirkin wrote:
> > > On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > OK so this gives us a quick revert as a solution for now.
> > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > If it crashes we either have a long standing problem in virtio
> > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > rings than what device requestes.
> > > > > Thanks!
> > > >
> > > > I applied the below and the problem persists.
> > > >
> > > > [...]
> > >
> > > Okay!
> >
> > Just checking - I applied and tested this atop 6.0-rc1, correct? Or did you
> > want me to test it with the 762faee5a267 reverted? I guess what you're trying
> > to test if a smaller queue than what's requested you'd want to do so without
> > the problematic patch applied...
> >
> >
> > Either way, I did this, and there are no issues that I could observe. No
> > oopses, no broken networking. But:
> >
> > To make sure it does something I added a debugging printk - which doesn't show
> > up. I assume this is at a point at least earlyprintk should work (which I see
> > getting enabled via serial)?
> >
> Sorry if I was unclear. I wanted to know whether the change somehow
> exposes a driver bug or a GCP bug. So what I wanted to do is to test
> this patch on top of *5.19*, not on top of the revert.
Right, the 5.19 part was clear, just the earlier test:
> > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > OK so this gives us a quick revert as a solution for now.
> > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > If it crashes we either have a long standing problem in virtio
> > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > Thanks!
I wasn't sure about.
After I didn't see any effect on 5.19 + your patch, I grew a bit suspicious
and added the printks.
> Yes I think printk should work here.
The reason the debug patch didn't change anything, and that my debug printk
didn't show, is that gcp uses the legacy paths...
If there were a bug in the legacy path, it'd explain why the problem only
shows on gcp, and not in other situations.
I'll queue testing the legacy path with the equivalent change.
- Andres
Greetings,
Andres Freund
Powered by blists - more mailing lists