linux-kernel - Re: Linux 5.13-rc6 regression to 5.12.x: kernel OOM and panic during kernel boot in low memory Xen VM's (256MB assigned memory).

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0b12f27b-1109-b621-c969-10814b2c1c2f@eikelenboom.it>
Date:   Thu, 17 Jun 2021 20:02:27 +0200
From:   Sander Eikelenboom <linux@...elenboom.it>
To:     Rasmus Villemoes <linux@...musvillemoes.dk>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>
Subject: Re: Linux 5.13-rc6 regression to 5.12.x: kernel OOM and panic during
 kernel boot in low memory Xen VM's (256MB assigned memory).

On 17/06/2021 17:37, Rasmus Villemoes wrote:
> On 17/06/2021 17.01, Linus Torvalds wrote:
>> On Thu, Jun 17, 2021 at 2:26 AM Sander Eikelenboom <linux@...elenboom.it> wrote:
>>>
>>> I just tried to upgrade and test the linux kernel going from the 5.12 kernel series to 5.13-rc6 on my homeserver with Xen, but ran in some trouble.
>>>
>>> Some VM's boot fine (with more than 256MB memory assigned), but the smaller (memory wise) PVH ones crash during kernel boot due to OOM.
>>> Booting VM's with 5.12(.9) kernel still works fine, also when dom0 is running 5.13-rc6 (but it has more memory assigned, so that is not unexpected).
>>
>> Adding Rasmus to the cc, because this looks kind of like the async
>> roofs population thing that caused some other oom issues too.
> 
> Yes, that looks like the same issue.
> 
>> Rasmus? Original report here:
>>
>>     https://lore.kernel.org/lkml/ee8bf04c-6e55-1d9b-7bdb-25e6108e8e1e@eikelenboom.it/
>>
>> I do find it odd that we'd be running out of memory so early..
> 
> Indeed. It would be nice to know if these also reproduce with
> initramfs_async=0 on the command line.
> 
> But what is even more curious is that in the other report
> (https://lore.kernel.org/lkml/20210607144419.GA23706@xsang-OptiPlex-9020/),
> it seemed to trigger with _more_ memory - though I may be misreading
> what Oliver was telling me:
> 
>> please be noted that we use 'vmalloc=512M' for both parent and this
> commit.
>> since it's ok on parent but oom on this commit, we want to send this
> report
>> to show the potential problem of the commit on some cases.
>>
>> we also tested by changing to use 'vmalloc=128M', it will succeed.
> 
> Those tests were done in a VM with 16G memory, and then he also wrote
> 
>> we also tried to follow exactly above steps to test on
>> some local machine (8G memory), but cannot reproduce.
> 
> Are there some special rules for what memory pools PID1 versus the
> kworker threads can dip into?
> 
> 
> Side note: I also had a ppc64 report with different symptoms (the
> initramfs was corrupted), but that turned out to also reproduce with
> e7cb072eb98 reverted, so that is likely unrelated. But just FTR that
> thread is here:
> https://lore.kernel.org/lkml/CA+QYu4qxf2CYe2gC6EYnOHXPKS-+cEXL=MnUvqRFaN7W1i6ahQ@mail.gmail.com/
> 
> Rasmus
> 

I choose to first finish the bisection attempt, not so suprising it ends up with:
e7cb072eb988e46295512617c39d004f9e1c26f8 is the first bad commit

So at least that link is confirmed.

I also checked out booting with "initramfs_async=0" and now the guest boots with the 5.13-rc6-ish kernel which fails without that.

--
Sander