[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ab2840.a925.19a97629b79.Coremail.00107082@163.com>
Date: Tue, 18 Nov 2025 22:33:33 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: "Linus Torvalds" <torvalds@...ux-foundation.org>,
catalin.marinas@....com, lance.yang@...ux.dev, b-padhi@...com,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
"Jan Polensky" <japo@...ux.ibm.com>
Subject: Re: Linux 6.18-rc6
At 2025-11-18 22:12:08, "David Hildenbrand (Red Hat)" <david@...nel.org> wrote:
>On 18.11.25 14:55, David Wang wrote:
>>
>>
>> At 2025-11-18 12:13:15, "David Wang" <00107082@....com> wrote:
>>>
>>>
>>> At 2025-11-18 09:10:50, "Linus Torvalds" <torvalds@...ux-foundation.org> wrote:
>>>> On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
>>>> <david@...nel.org> wrote:
>>>>>
>>>>> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
>>>>> that KASAN would complain because we are clearing the page before doing
>>>>> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>>>>>
>>>>> It didn't trigger, and I assume it is because clear_highpage() on x86
>>>>> will not be instrumented by KASAN (my theory).
>>>>>
>>>>> The comment in kernel_init_pages() indicates that s390x uses memset()
>>>>> for that purpose and I would assume that that one would be instrumented.
>>>>
>>>> So I have thought about this some more, and I am not entirely happy
>>>> about any of this, but I think the way forward is to
>>>>
>>>> (a) make tag_clear_highpage() just do multiple pages in one go (and
>>>> rename it as tag_clear_highpage*s*() in the process)
>>>>
>>>> (b) make it have an actually return value to indicate whether it
>>>> initialized things
>>>>
>>>> which means that the post_alloc_hook() code just becomes
>>>>
>>>> if (zero_tags)
>>>> init = tag_clear_highpages(page, 1 << order);
>>>>
>>>> and then the generic fallback becomes just
>>>>
>>>> static inline bool tag_clear_highpages(struct page *page, int numpages)
>>>> {
>>>> return false;
>>>> }
>>>>
>>>> which makes this all a complete no-op for architectures that don't do
>>>> this memory tagging.
>>>>
>>>> And the one architecture that *does* do it - arm64 - actually
>>>> simplifies too, because now instead of being called in a loop - and
>>>> having that
>>>>
>>>> if (!system_supports_mte()) {
>>>> clear_highpage(page);
>>>> return;
>>>> }
>>>>
>>>> in every iteration of the loop, it now just gets called *once*, and it
>>>> instead just does
>>>>
>>>> if (!system_supports_mte())
>>>> return false;
>>>>
>>>> and then it does the *clearing* in a loop instead.
>>>>
>>>> End result: that all looks much saner to me, and should avoid all the
>>>> issues with KASAN (well, arm64 currently clearly depends on
>>>> mte_zero_clear_page_tags() being assembly code that doesn't trigger
>>>> KASAN anyway).
>>>>
>>>> But maybe it looks saner to me just because I've written that code now.
>>>>
>>>> Anyway, here's my suggested patch. I still prefer this over having
>>>> more config variables and #ifdef's. I'd much rather have code that
>>>> just does the right thing and becomes null and void when it's
>>>> effecitlvely disabled by not having hardware support.
>>>>
>>>> Comments?
>>>>
>>>> This is all entirely untested, but I did build it on both x86-64 and
>>>
>>>> arm64. So it must be perfect. Right?
>>>
>>>
>>> I tried this patch, my prometheus service crash with:
>>> fatal error: acquireSudog: found s.elem != nil in cache
>>> seems some memory is still not properly zeroed. (I guess)
>>> But this time, my old go compiler works fine.
>>
>>
>> Update: with this patch, my go programs still crash, It was just that
>> the first time I test the patch, old go compiler happened to work. When I reboot, my
>> go program start to crash again. The crash seems random, but on my system,
>> go program crashes with *very* high probability.
>>
>> (And I applied the patch based on 6.18-rc6.)
>
>Can you try with
>
> init = !tag_clear_highpages(page, 1 << order);
>
>instead of
>
> init = tag_clear_highpages(page, 1 << order);
>
>
>So when the function returns "false" (we did not clear), we will have to
>initialize.
Oh, yes, this make sense.
With this fix, my crashes are gone again. (I tested it with several round of reboot.)
Thanks
David Wang
>
>--
>Cheers
>
>David
Powered by blists - more mailing lists