lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e53cb624-f83d-4898-b3f3-b20b702cf00c@kernel.org>
Date: Tue, 18 Nov 2025 15:12:08 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: David Wang <00107082@....com>,
 Linus Torvalds <torvalds@...ux-foundation.org>
Cc: catalin.marinas@....com, lance.yang@...ux.dev, b-padhi@...com,
 akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
 Jan Polensky <japo@...ux.ibm.com>
Subject: Re: Linux 6.18-rc6

On 18.11.25 14:55, David Wang wrote:
> 
> 
> At 2025-11-18 12:13:15, "David Wang" <00107082@....com> wrote:
>>
>>
>> At 2025-11-18 09:10:50, "Linus Torvalds" <torvalds@...ux-foundation.org> wrote:
>>> On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
>>> <david@...nel.org> wrote:
>>>>
>>>> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
>>>> that KASAN would complain because we are clearing the page before doing
>>>> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>>>>
>>>> It didn't trigger, and I assume it is because clear_highpage() on x86
>>>> will not be instrumented by KASAN (my theory).
>>>>
>>>> The comment in kernel_init_pages() indicates that s390x uses memset()
>>>> for that purpose and I would assume that that one would be instrumented.
>>>
>>> So I have thought about this some more, and I am not entirely happy
>>> about any of this, but I think the way forward is to
>>>
>>> (a) make tag_clear_highpage() just do multiple pages in one go (and
>>> rename it as tag_clear_highpage*s*() in the process)
>>>
>>> (b) make it have an actually return value to indicate whether it
>>> initialized things
>>>
>>> which means that the post_alloc_hook() code just becomes
>>>
>>>         if (zero_tags)
>>>                 init = tag_clear_highpages(page, 1 << order);
>>>
>>> and then the generic fallback becomes just
>>>
>>>   static inline bool tag_clear_highpages(struct page *page, int numpages)
>>>   {
>>>          return false;
>>>   }
>>>
>>> which makes this all a complete no-op for architectures that don't do
>>> this memory tagging.
>>>
>>> And the one architecture that *does* do it - arm64 - actually
>>> simplifies too, because now instead of being called in a loop - and
>>> having that
>>>
>>>         if (!system_supports_mte()) {
>>>                  clear_highpage(page);
>>>                  return;
>>>         }
>>>
>>> in every iteration of the loop, it now just gets called *once*, and it
>>> instead just does
>>>
>>>         if (!system_supports_mte())
>>>                 return false;
>>>
>>> and then it does the *clearing* in a loop instead.
>>>
>>> End result: that all looks much saner to me, and should avoid all the
>>> issues with KASAN (well, arm64 currently clearly depends on
>>> mte_zero_clear_page_tags() being assembly code that doesn't trigger
>>> KASAN anyway).
>>>
>>> But maybe it looks saner to me just because I've written that code now.
>>>
>>> Anyway, here's my suggested patch. I still prefer this over having
>>> more config variables and #ifdef's. I'd much rather have code that
>>> just does the right thing and becomes null and void when it's
>>> effecitlvely disabled by not having hardware support.
>>>
>>> Comments?
>>>
>>> This is all entirely untested, but I did build it on both x86-64 and
>>
>>> arm64. So it must be perfect. Right?
>>
>>
>> I tried this patch, my prometheus service crash with:
>>         fatal error: acquireSudog: found s.elem != nil in cache
>> seems some memory is still not properly zeroed. (I guess)
>> But this time, my old go compiler works fine.
> 
> 
> Update: with this patch, my go programs still crash, It was just  that
> the first time I test the patch, old go compiler happened to work. When I reboot, my
> go program start to crash again.  The crash seems random, but on my system,
> go program crashes with *very* high probability.
> 
> (And I applied the patch based on 6.18-rc6.)

Can you try with

	init = !tag_clear_highpages(page, 1 << order);

instead of

	init = tag_clear_highpages(page, 1 << order);


So when the function returns "false" (we did not clear), we will have to 
initialize.

-- 
Cheers

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ