linux-kernel - Re: [PATCH 2/2] mm: convert do_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A243EBEA-22E7-4F57-9293-177500463B38@nvidia.com>
Date: Wed, 07 May 2025 19:46:58 -0400
From: Zi Yan <ziy@...dia.com>
To: David Hildenbrand <david@...hat.com>
Cc: Matthew Wilcox <willy@...radead.org>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
 hannes@...xchg.org, lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com,
 npache@...hat.com, ryan.roberts@....com, dev.jain@....com, vbabka@...e.cz,
 rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] mm: convert do_set_pmd() to take a folio

On 7 May 2025, at 17:24, David Hildenbrand wrote:

> On 07.05.25 14:10, Matthew Wilcox wrote:
>> On Wed, May 07, 2025 at 05:26:13PM +0800, Baolin Wang wrote:
>>> In do_set_pmd(), we always use the folio->page to build PMD mappings for
>>> the entire folio. Since all callers of do_set_pmd() already hold a stable
>>> folio, converting do_set_pmd() to take a folio is safe and more straightforward.
>>
>> What testing did you do of this?
>>
>>> -vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>>> +vm_fault_t do_set_pmd(struct vm_fault *vmf, struct folio *folio)
>>>   {
>>> -	struct folio *folio = page_folio(page);
>>>   	struct vm_area_struct *vma = vmf->vma;
>>>   	bool write = vmf->flags & FAULT_FLAG_WRITE;
>>>   	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
>>>   	pmd_t entry;
>>>   	vm_fault_t ret = VM_FAULT_FALLBACK;
>>> +	struct page *page;
>>
>> Because I see nowhere in this patch that you initialise 'page'.
>>
>> And that's really the important part.  You seem to be assuming that a
>> folio will never be larger than PMD size, and I'm not comfortable with
>> that assumption.  It's a limitation I put in place a few years ago so we
>> didn't have to find and fix all those assumptions immediately, but I
>> imagine that some day we'll want to have larger folios.
>>
>> So unless you can derive _which_ page in the folio we want to map from
>> the vmf, NACK this patch.
>
> Agreed. Probably folio + idx is our best bet.
>
> Which raises an interesting question: I assume in the future, when we have a 4 MiB folio on x86-64 that is *misaligned* in VA space regarding PMDs (e.g., aligned to 1 MiB but not 2 MiB), we could still allow to use a PMD for the middle part.

It might not be possible if the folio comes from buddy allocator due to how
buddy allocator merges a PFN with its buddy (see __find_buddy_pfn() in mm/internal.h).
A 4MB folio will always be two 2MB-aligned parts. In addition, VA and PA need
to have the same lower 9+12 bits for a PMD mapping. So PMD mappings for
a 4MB folio would always be two PMDs. Let me know if I miss anything.

Of course, if the folio comes from alloc_contig_range() or we add support for
in-place folio promotion, the situation you are talking about would be possible.

>
> So idx must not necessarily be aligned to PMDs in the future.
>
> For now, we could sanity-check that idx is always 0.
>
> But the rmap sanity checks in folio_add_file_rmap_pmd() will already catch that for us.



--
Best Regards,
Yan, Zi