linux-kernel - Re: upstream kernel crashes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1c057afa-92df-ee3c-5978-3731d3db9345@kernel.dk>
Date:   Sun, 14 Aug 2022 19:04:22 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Andres Freund <andres@...razel.de>,
        James Bottomley <James.Bottomley@...senpartnership.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Cc:     Guenter Roeck <linux@...ck-us.net>, linux-kernel@...r.kernel.org,
        Greg KH <gregkh@...uxfoundation.org>
Subject: Re: upstream kernel crashes

On 8/14/22 4:47 PM, Linus Torvalds wrote:
> On Sun, Aug 14, 2022 at 3:37 PM Andres Freund <andres@...razel.de> wrote:
>>
>> That range had different symptoms, I think (networking not working, but not
>> oopsing). I hit similar issues when tried to reproduce the issue
>> interactively, to produce more details, and unwisely did git pull instead of
>> checking out the precise revision, ending up with aea23e7c464b. That's when
>> symptoms look similar to the above.  So it'd be 69dac8e431af..aea23e7c464b
>> that I'd be more suspicious off in the context of this thread.
> 
> Ok.
> 
>> Which would make me look at the following first:
>> e140f731f980 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> abe7a481aac9 Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
>> 1da8cf961bb1 Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
> 
> All right, that maks sense.The reported oopses seem to be about block
> requests. Some of them were scsi in particular.
> 
> Let's bring in Jens and the SCSI people. Maybe that host reference
> counting? There's quite a lot of "move freeing around" in that late
> scsi pull, even if it was touted as "mostly small bug fixes and
> trivial updates".
> 
> Here's the two threads:
> 
>   https://lore.kernel.org/all/20220814212610.GA3690074@roeck-us.net/
>   https://lore.kernel.org/all/20220814043906.xkmhmnp23bqjzz4s@awork3.anarazel.de/
> 
> but I guess I'll do an -rc1 regardless of this, because I need to
> close the merge window.

I took a quick look and added more SCSI bits to my vm images, but
haven't been able to hit it. But if this is happening after the above
mentioned merges, does seem like it's more SCSI related. The block side
is only really an error handling fix on that front, the rest is just
nvme. Seems unlikely that'd be the culprit.

Sounds like Andres is already bisecting this, so I guess we'll be wiser
soon enough.

-- 
Jens Axboe