lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEvNRgFw_P-GZ=Z4irGheM4O-RnXkd_bdAd0AT7fUuwXEnSqHQ@mail.gmail.com>
Date: Mon, 9 Feb 2026 13:31:03 -0800
From: Ackerley Tng <ackerleytng@...gle.com>
To: "David Hildenbrand (Arm)" <david@...nel.org>, Deepanshu Kartikey <kartikey406@...il.com>
Cc: akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com, 
	baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, npache@...hat.com, 
	ryan.roberts@....com, dev.jain@....com, baohua@...nel.org, seanjc@...gle.com, 
	pbonzini@...hat.com, michael.roth@....com, vannapurve@...gle.com, 
	ziy@...dia.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	syzbot+33a04338019ac7e43a44@...kaller.appspotmail.com
Subject: Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()

"David Hildenbrand (Arm)" <david@...nel.org> writes:

> On 2/9/26 20:45, David Hildenbrand (Arm) wrote:
>> On 2/9/26 19:22, Ackerley Tng wrote:
>>> Deepanshu Kartikey <kartikey406@...il.com> writes:
>>>
>>>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm)
>>>> <david@...nel.org> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> Thanks for the suggestion. I looked into the get_write_access() path.
>>>>
>>>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
>>>> calling get_write_access(), so i_writecount stays 0. That's why
>>>> file_thp_enabled() sees them as read-only files.
>>>>
>>>> We could add get_write_access() after alloc_file_pseudo() in both, but
>>>> I think that would be a hack rather than a proper fix:
>>>>
>>>> - i_writecount has a specific semantic: tracking how many fds have the
>>>> file open for writing. We'd be bumping it just to influence
>>>> file_thp_enabled() behavior.
>>>>
>>>
>>> I agree re-using i_writecount feels odd since it is abusing the idea of
>>> being written to. I might have misunderstood the full context of
>>> i_writecount though.
>>
>> i_writecount means "the file is open with write access" IIUC. So one can
>> mmap(PROT_WRITE) it etc.
>>
>> And that's kind of the thing: the virtual file is open with write
>> access. That's why I am still wondering whether mimicking that is
>> actually the right fix.
>>
>>>
>>>> - It doesn't express the actual intent. The real issue is that
>>>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
>>>> backed files.
>>>>
>>>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
>>>> the cleaner approach. It is explicit, has no side effects, and is easy
>>>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>>>>
>>>
>>> I was considering other address space flags and I think the best might
>>> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
>>> __vma_thp_allowable_orders() check the maximum allowed order for the
>>> address space.
>>
>> The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these
>> folio order checks.

Ah that's true.

>> Changing it would degrade filesystems that do not
>> support large folios yet. IOW, it would be similar to ripping out
>> CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)
>>
>>>
>>> khugepaged is about consolidating memory to huge pages, so if the
>>> address space doesn't allow a larger folio order, then khugepaged should
>>> not operate on that memory.
>>>
>>> The other options are
>>>
>>> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
>>>    but IIUC evictability is more closely related to swapping and
>>>    khugepaged might operate on swappable memory?
>> Right, it does not really make sense
>>
>>> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
>>>    to block migration. khugepaged kind of migrates the memory contents
>>>    too, but someday we want guest_memfd to support migration, and at that
>>>    time we would still want to block khugepaged, so I don't think we want
>>>    to reuse a flag that couples khugepaged to migration.
>>
>> It could be used at least for the time being and to fix the issue.
>
> mapping_inaccessible(mapping) indeed looks like the easiest fix, given that
> shmem "somehow" works, lol.
>

I could also check shmem, but I'm not sure which conditions to set up
shmem for, since shmem could be used in so many ways. Any suggestions?
Off the top of my head, shmem lots of special-casing in the khugepaged
flow...

> BUT, something just occurred to me.
>
> We added the mc-handling in
>
> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
> Author: Jiaqi Yan <jiaqiyan@...gle.com>
> Date:   Wed Mar 29 08:11:19 2023 -0700
>
>      mm/khugepaged: recover from poisoned anonymous memory
>
> ..
>
> So I assume kernels before that would crash when collapsing?
>
> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
>
> So I suspect we need a fix+stable backport.
>
> Who volunteers to try a secretmem reproducer on a stable kernel? :)
>

I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
special-casing secretmem like you suggested below?

>
> The following is a bit nasty as well but should do the trick until we rip
> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 03886d4ccecc..4ac1cb36b861 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -40,6 +40,7 @@
>   #include <linux/pgalloc.h>
>   #include <linux/pgalloc_tag.h>
>   #include <linux/pagewalk.h>
> +#include <linux/secretmem.h>
>
>   #include <asm/tlb.h>
>   #include "internal.h"
> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
>          inode = file_inode(vma->vm_file);
>
> +       if (mapping_inaccessible(inode->i_mapping) ||
> +           secretmem_mapping(inode->i_mapping))
> +               return false;
> +

Regarding the degradation of filesystems that don't support large folios
yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX
would disable collapsing for filesystems that actually want pages to be
collapsed, but don't update max folio order and hence appear to not
support large folios yet?

What about a check like this instead

	if (!mapping_large_folio_support())
        	return false;

And then when CONFIG_READ_ONLY_THP_FOR_FS is removed, part of that work
would involve getting filesystems to update AS_FOLIO_ORDER_MAX?

>          return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>   }
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199
>
> --
> Cheers,
>
> David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ