linux-kernel - Re: Linux 5.13-rc6 regression to 5.12.x: kernel OOM and panic during kernel boot in low memory Xen VM's (256MB assigned memory).

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9108c22e-3521-9e24-6124-7776d947b788@rasmusvillemoes.dk>
Date:   Thu, 17 Jun 2021 17:37:51 +0200
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Sander Eikelenboom <linux@...elenboom.it>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>
Subject: Re: Linux 5.13-rc6 regression to 5.12.x: kernel OOM and panic during
 kernel boot in low memory Xen VM's (256MB assigned memory).

On 17/06/2021 17.01, Linus Torvalds wrote:
> On Thu, Jun 17, 2021 at 2:26 AM Sander Eikelenboom <linux@...elenboom.it> wrote:
>>
>> I just tried to upgrade and test the linux kernel going from the 5.12 kernel series to 5.13-rc6 on my homeserver with Xen, but ran in some trouble.
>>
>> Some VM's boot fine (with more than 256MB memory assigned), but the smaller (memory wise) PVH ones crash during kernel boot due to OOM.
>> Booting VM's with 5.12(.9) kernel still works fine, also when dom0 is running 5.13-rc6 (but it has more memory assigned, so that is not unexpected).
> 
> Adding Rasmus to the cc, because this looks kind of like the async
> roofs population thing that caused some other oom issues too.

Yes, that looks like the same issue.

> Rasmus? Original report here:
> 
>    https://lore.kernel.org/lkml/ee8bf04c-6e55-1d9b-7bdb-25e6108e8e1e@eikelenboom.it/
> 
> I do find it odd that we'd be running out of memory so early..

Indeed. It would be nice to know if these also reproduce with
initramfs_async=0 on the command line.

But what is even more curious is that in the other report
(https://lore.kernel.org/lkml/20210607144419.GA23706@xsang-OptiPlex-9020/),
it seemed to trigger with _more_ memory - though I may be misreading
what Oliver was telling me:

> please be noted that we use 'vmalloc=512M' for both parent and this
commit.
> since it's ok on parent but oom on this commit, we want to send this
report
> to show the potential problem of the commit on some cases.
>
> we also tested by changing to use 'vmalloc=128M', it will succeed.

Those tests were done in a VM with 16G memory, and then he also wrote

> we also tried to follow exactly above steps to test on
> some local machine (8G memory), but cannot reproduce.

Are there some special rules for what memory pools PID1 versus the
kworker threads can dip into?


Side note: I also had a ppc64 report with different symptoms (the
initramfs was corrupted), but that turned out to also reproduce with
e7cb072eb98 reverted, so that is likely unrelated. But just FTR that
thread is here:
https://lore.kernel.org/lkml/CA+QYu4qxf2CYe2gC6EYnOHXPKS-+cEXL=MnUvqRFaN7W1i6ahQ@mail.gmail.com/

Rasmus