linux-kernel - Re: [PATCH v1 2/4] mm: smaller folio_pte

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72f26f80-38ff-4686-a9c4-26071d6fd4df@redhat.com>
Date: Fri, 27 Jun 2025 19:02:16 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 Andrew Morton <akpm@...ux-foundation.org>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
 <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
 Mike Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
 Michal Hocko <mhocko@...e.com>, Zi Yan <ziy@...dia.com>,
 Matthew Brost <matthew.brost@...el.com>,
 Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
 Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
 Ying Huang <ying.huang@...ux.alibaba.com>,
 Alistair Popple <apopple@...dia.com>, Pedro Falcato <pfalcato@...e.de>,
 Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>
Subject: Re: [PATCH v1 2/4] mm: smaller folio_pte_batch() improvements

On 27.06.25 18:51, Lorenzo Stoakes wrote:
> On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote:
>> Let's clean up a bit:
>>
>> (1) No need for start_ptep vs. ptep anymore, we can simply use ptep
>>
>> (2) Let's switch to "unsigned int" for everything
>>
>> (3) We can simplify the code by leaving the pte unchanged after the
>>      pte_same() check.
>>
>> (4) Clarify that we should never exceed a single VMA; it indicates a
>>      problem in the caller.
>>
>> No functional change intended.
>>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> ---
>>   mm/internal.h | 37 +++++++++++++++----------------------
>>   1 file changed, 15 insertions(+), 22 deletions(-)
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 9690c75063881..ca6590c6d9eab 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -221,7 +221,7 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
>>    * folio_pte_batch - detect a PTE batch for a large folio
>>    * @folio: The large folio to detect a PTE batch for.
>>    * @addr: The user virtual address the first page is mapped at.
>> - * @start_ptep: Page table pointer for the first entry.
>> + * @ptep: Page table pointer for the first entry.
>>    * @pte: Page table entry for the first page.
>>    * @max_nr: The maximum number of table entries to consider.
>>    * @flags: Flags to modify the PTE batch semantics.
>> @@ -233,24 +233,24 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
>>    *		  first one is dirty.
>>    *
>>    * Detect a PTE batch: consecutive (present) PTEs that map consecutive
>> - * pages of the same large folio.
>> + * pages of the same large folio in a single VMA and a single page table.
>>    *
>>    * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
>>    * the accessed bit, writable bit, dirty bit (unless FPB_HONOR_DIRTY is set) and
>>    * soft-dirty bit (unless FPB_HONOR_SOFT_DIRTY is set).
>>    *
>> - * start_ptep must map any page of the folio. max_nr must be at least one and
>> - * must be limited by the caller so scanning cannot exceed a single page table.
>> + * @ptep must map any page of the folio. max_nr must be at least one and
>> + * must be limited by the caller so scanning cannot exceed a single VMA and
>> + * a single page table.
>>    *
>>    * Return: the number of table entries in the batch.
>>    */
>> -static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
>> -		pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
>> +static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long addr,
> 
> Do we need to worry about propagating this type change?
> 
> mremap_folio_pte_batch() and zap_present_ptes() return the value as an int for example.
> 
> I mean I doubt we're going to be seeing an overflow here :) but maybe worth
> propagating this everywhere.

Yeah, I'm planning on cleaning some of that stuff separately. As you 
say, this shouldn't cause harm.

Really, it could only cause harm if someone would be doing

"- folio_pte_batch()" and then assigning it to a "long".

> 
>> +		pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags,
>>   		bool *any_writable, bool *any_young, bool *any_dirty)
>>   {
>> -	pte_t expected_pte, *ptep;
>> -	bool writable, young, dirty;
>> -	int nr, cur_nr;
>> +	unsigned int nr, cur_nr;
>> +	pte_t expected_pte;
>>
>>   	if (any_writable)
>>   		*any_writable = false;
>> @@ -267,29 +267,22 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
>>   	max_nr = min_t(unsigned long, max_nr,
>>   		       folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
>>
>> -	nr = pte_batch_hint(start_ptep, pte);
>> +	nr = pte_batch_hint(ptep, pte);
>>   	expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
>> -	ptep = start_ptep + nr;
>> +	ptep = ptep + nr;
>>
>>   	while (nr < max_nr) {
>>   		pte = ptep_get(ptep);
>> -		if (any_writable)
>> -			writable = !!pte_write(pte);
>> -		if (any_young)
>> -			young = !!pte_young(pte);
>> -		if (any_dirty)
>> -			dirty = !!pte_dirty(pte);
>> -		pte = __pte_batch_clear_ignored(pte, flags);
>>
>> -		if (!pte_same(pte, expected_pte))
>> +		if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte))
> 
> Doing this here will change the output of any_writable, any_young:
> 
> static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
> {
> 	...
> 	return pte_wrprotect(pte_mkold(pte));
> }
> 
> So we probably need to get these values earlier?


Note that __pte_batch_clear_ignored() leaves the pte unchanged, and only 
returns the modified pte. (like pte_mkold() and friends).

So what we read through ptep_get() is still what we have after the 
pte_same() check.


-- 
Cheers,

David / dhildenb