[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f1d7db5-0ca1-4218-95e3-eb3256a5ad65@kernel.org>
Date: Wed, 19 Nov 2025 13:01:14 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Catalin Marinas <catalin.marinas@....com>
Cc: "Longia, Amandeep Kaur" <AmandeepKaur.Longia@....com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
akpm@...ux-foundation.org, lance.yang@...ux.dev, will@...nel.org,
b-padhi@...com, aishwarya.tcv@....com, broonie@...nel.org,
Raghavendra.KodsaraThimmappa@....com, Jan Polensky <japo@...ux.ibm.com>
Subject: Re: [BUG] General Protection Faults During Git Clone and Kernel Build
on Latest Kernel
On 19.11.25 12:58, Catalin Marinas wrote:
> On Mon, Nov 17, 2025 at 01:24:43PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 17.11.25 13:08, Longia, Amandeep Kaur wrote:
>>> Hi all,
>>>
>>> We've encountered critical issues while running our CI pipeline on the
>>> latest kernel (v6.18-rc6), which involves cloning multiple repositories
>>> for testing and building the kernel. During this process, we observed
>>> two major issues:
>>
>> Hi,
>>
>> I observed something similar while testing on Friday between rc4 (good) and
>> rc5+ (bad).
>>
>> I'm sure this it the known issue of adfb6609c6809e107ded9a1cd46f519c882e64ea
>> we discussed already here [1].
>>
>>
>> @Jan, can you send the fix out today? Otherwise I can take care of this so
>> we get this fixed asap.
>>
>> [1] https://lkml.kernel.org/r/20251109003613.1461433-1-japo@linux.ibm.com
>
> In the worst case, I think Andrew can just revert commit adfb6609c680
> ("mm/huge_memory: initialise the tags of the huge zero folio"), we can
> fix it properly with a cc stable afterwards.
>
> BTW, another quick fix (pretty much what arm64 does when MTE is off):
Now fixed upsteam
commit 5bebe8de19264946d398ead4e6c20c229454a552
Author: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Tue Nov 18 08:21:27 2025 -0800
mm/huge_memory: Fix initialization of huge zero folio
The recent fix to properly initialize the tags of the huge zero folio
had an unfortunate not-so-subtle side effect: it caused the actual
*contents* of the huge zero folio to not be initialized at all when the
hardware didn't support the memory tagging.
The reason was the unfortunate semantics of tag_clear_highpage(): on
hardware that didn't do the tagging, it would silently just not do
anything at all. And since this is done only on arm64 with MTE support,
that basically meant most hardware.
It wasn't necessarily immediately obvious since the huge zero page isn't
necessarily very heavily used - or because it might already be zero
because all-zeroes is the most common pattern. But it ends up causing
random odd user space failures when you do hit it.
The unfortunate semantics have been around for a while, but became a
real bug only when we started actively using __GFP_ZEROTAGS in the
generic get_huge_zero_folio() function - before that, it had only ever
been used in code that checked that the hardware supported it.
Fix this by simply changing the semantics of tag_clear_highpage() to
return whether it actually successfully did something or not. While at
it, also make it initialize multiple pages in one go, since that's
actually what the only caller wants it to do and it simplifies the whole
logic.
Fixes: adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio")
Link: https://lore.kernel.org/all/20251117082023.90176-1-00107082@163.com/
Reviewed-by: David Hildenbrand (Red Hat) <david@...nel.org>
Reported-and-tested-by: David Wang <00107082@....com>
Reported-and-tested-by: Carlos Llamas <cmllamas@...gle.com>
Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
--
Cheers
David
Powered by blists - more mailing lists