[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ed1b111-f2f1-4f89-9308-fdd9d706ca37@kernel.org>
Date: Mon, 9 Feb 2026 20:45:02 +0100
From: "David Hildenbrand (Arm)" <david@...nel.org>
To: Ackerley Tng <ackerleytng@...gle.com>,
Deepanshu Kartikey <kartikey406@...il.com>
Cc: akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com,
baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, npache@...hat.com,
ryan.roberts@....com, dev.jain@....com, baohua@...nel.org,
seanjc@...gle.com, pbonzini@...hat.com, michael.roth@....com,
vannapurve@...gle.com, ziy@...dia.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
syzbot+33a04338019ac7e43a44@...kaller.appspotmail.com
Subject: Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in
file_thp_enabled()
On 2/9/26 19:22, Ackerley Tng wrote:
> Deepanshu Kartikey <kartikey406@...il.com> writes:
>
>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@...nel.org> wrote:
>>>
>>>
>>> On second thought, why do we pass the
>>>
>>> !inode_is_open_for_write(inode)
>>>
>>> in file_thp_enabled()?
>>>
>>> Isn't that the main problem for these memfd things?
>>>
>>> Maybe a get_write_access() is missing somewhere?
>>>
>>
>> Hi David,
>>
>> Thanks for the suggestion. I looked into the get_write_access() path.
>>
>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
>> calling get_write_access(), so i_writecount stays 0. That's why
>> file_thp_enabled() sees them as read-only files.
>>
>> We could add get_write_access() after alloc_file_pseudo() in both, but
>> I think that would be a hack rather than a proper fix:
>>
>> - i_writecount has a specific semantic: tracking how many fds have the
>> file open for writing. We'd be bumping it just to influence
>> file_thp_enabled() behavior.
>>
>
> I agree re-using i_writecount feels odd since it is abusing the idea of
> being written to. I might have misunderstood the full context of
> i_writecount though.
i_writecount means "the file is open with write access" IIUC. So one can
mmap(PROT_WRITE) it etc.
And that's kind of the thing: the virtual file is open with write
access. That's why I am still wondering whether mimicking that is
actually the right fix.
>
>> - It doesn't express the actual intent. The real issue is that
>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
>> backed files.
>>
>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
>> the cleaner approach. It is explicit, has no side effects, and is easy
>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>>
>
> I was considering other address space flags and I think the best might
> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
> __vma_thp_allowable_orders() check the maximum allowed order for the
> address space.
The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these
folio order checks. Changing it would degrade filesystems that do not
support large folios yet. IOW, it would be similar to ripping out
CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)
>
> khugepaged is about consolidating memory to huge pages, so if the
> address space doesn't allow a larger folio order, then khugepaged should
> not operate on that memory.
>
> The other options are
>
> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
> but IIUC evictability is more closely related to swapping and
> khugepaged might operate on swappable memory?
Right, it does not really make sense
> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
> to block migration. khugepaged kind of migrates the memory contents
> too, but someday we want guest_memfd to support migration, and at that
> time we would still want to block khugepaged, so I don't think we want
> to reuse a flag that couples khugepaged to migration.
It could be used at least for the time being and to fix the issue.
--
Cheers,
David
Powered by blists - more mailing lists