[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3cd2240b-ec2c-45d0-b73b-b66c83e75b9f@linuxfoundation.org>
Date: Thu, 4 Dec 2025 16:20:15 -0700
From: Shuah Khan <skhan@...uxfoundation.org>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: akpm@...ux-foundation.org, Alexander Deucher <Alexander.Deucher@....com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
amd-gfx@...ts.freedesktop.org, dri-devel <dri-devel@...ts.freedesktop.org>,
Guenter Roeck <linux@...ck-us.net>,
Linux Memory Management List <linux-mm@...ck.org>,
Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: Linux 6.18 amdgpu build error
On 12/4/25 12:45, David Hildenbrand (Red Hat) wrote:
> On 12/4/25 20:36, Linus Torvalds wrote:
>> On Thu, 4 Dec 2025 at 09:40, Shuah Khan <skhan@...uxfoundation.org> wrote:
>>>
>>> This commit has impact on all architectures, not a narrow scoped
>>> powerpc only thing - it enables HAVE_GIGANTIC_FOLIOS on x86_64
>>> and changes the common code that determines MAX_FOLIO_ORDER in
>>> include/linux/mm.h
>>
>> So I suspect your bisection might not have worked out, and there might
>> be two different things going on.
>>
>> In particular, hugepages were broken in 6.18-rc6 due to commit
>> adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero
>> folio").
>>
>> That was then fixed for rc7 (and obviously final 6.18) by commit
>> 5bebe8de19264 ("mm/huge_memory: Fix initialization of huge zero
>> folio"), but the breakage up until that time was a bit random.
>>
Both my systems were running rc6 - I was stuck in a state
where I was able to rebase to rc7 and then 6.18, but could
never build either one.
>> End result: if you ever ended up bisecting into that broken range
>> between those two commits, you would get failures on some loads (but
>> not reliably), and your bisection would end up pointing to some random
>> thing.
>>
>> But as mentioned, that particular problem would have been fixed in rc7
>> and in final 6.18, so any issues you saw with the final build would
>> have been due to something else.
>>
>> Can I ask you to try to re-do the bisection, but with that commit
>> 5bebe8de19264 applied by hand - if it wasn't already there - every
>> time you build a kernel that has adfb6609c680?
When I suspected rc6 to be the problem, I booted rc5 and compiled 6.18
after reverting 39231e8d6ba based on config file changes between rc5
and rc6.
>
> Right, that's what I also proposed in [1].
>
> I cannot make sense of how 39231e8d6ba could possibly trigger it given that it only affects the value of MAX_FOLIO_ORDER --- which is primarily used for safety checks and snapshot_page(), nothing that could explain changed application behavior, really.
>
> But while Shuah is retesting, I'll go have a yet another look.
I retested on both systems on 6.18 making sure I have 5bebe8de19264
and 39231e8d6ba in there. I cloned linux_next and built it on both.
I didn't see any problems on 6.18. Having said that, It might make
sense to hold off on including 39231e8d6ba in 6.18 so there is more
time to test beyond 2 rc cycles. That is for you all to decide.
thanks,
-- Shuah
Powered by blists - more mailing lists