linux-kernel - Re: [PATCH 2/2] mm: convert do_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a5056791-0a3e-40f6-bb83-7f39ef76b346@redhat.com>
Date: Thu, 8 May 2025 09:36:02 +0200
From: David Hildenbrand <david@...hat.com>
To: Zi Yan <ziy@...dia.com>
Cc: Matthew Wilcox <willy@...radead.org>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
 hannes@...xchg.org, lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com,
 npache@...hat.com, ryan.roberts@....com, dev.jain@....com, vbabka@...e.cz,
 rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] mm: convert do_set_pmd() to take a folio

On 08.05.25 01:46, Zi Yan wrote:
> On 7 May 2025, at 17:24, David Hildenbrand wrote:
> 
>> On 07.05.25 14:10, Matthew Wilcox wrote:
>>> On Wed, May 07, 2025 at 05:26:13PM +0800, Baolin Wang wrote:
>>>> In do_set_pmd(), we always use the folio->page to build PMD mappings for
>>>> the entire folio. Since all callers of do_set_pmd() already hold a stable
>>>> folio, converting do_set_pmd() to take a folio is safe and more straightforward.
>>>
>>> What testing did you do of this?
>>>
>>>> -vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>>>> +vm_fault_t do_set_pmd(struct vm_fault *vmf, struct folio *folio)
>>>>    {
>>>> -	struct folio *folio = page_folio(page);
>>>>    	struct vm_area_struct *vma = vmf->vma;
>>>>    	bool write = vmf->flags & FAULT_FLAG_WRITE;
>>>>    	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
>>>>    	pmd_t entry;
>>>>    	vm_fault_t ret = VM_FAULT_FALLBACK;
>>>> +	struct page *page;
>>>
>>> Because I see nowhere in this patch that you initialise 'page'.
>>>
>>> And that's really the important part.  You seem to be assuming that a
>>> folio will never be larger than PMD size, and I'm not comfortable with
>>> that assumption.  It's a limitation I put in place a few years ago so we
>>> didn't have to find and fix all those assumptions immediately, but I
>>> imagine that some day we'll want to have larger folios.
>>>
>>> So unless you can derive _which_ page in the folio we want to map from
>>> the vmf, NACK this patch.
>>
>> Agreed. Probably folio + idx is our best bet.
>>
>> Which raises an interesting question: I assume in the future, when we have a 4 MiB folio on x86-64 that is *misaligned* in VA space regarding PMDs (e.g., aligned to 1 MiB but not 2 MiB), we could still allow to use a PMD for the middle part.
> 
> It might not be possible if the folio comes from buddy allocator due to how
> buddy allocator merges a PFN with its buddy (see __find_buddy_pfn() in mm/internal.h).
> A 4MB folio will always be two 2MB-aligned parts. In addition, VA and PA need
> to have the same lower 9+12 bits for a PMD mapping. So PMD mappings for
> a 4MB folio would always be two PMDs. Let me know if I miss anything.

PA is clear. But is mis-alignment in VA space impossible on all 
architectures? I certainly remember it being impossible on x86-64 and 
s390x (remaining PMD entry bits used for something else).

-- 
Cheers,

David / dhildenb