linux-kernel - Re: [PATCH v4 3/4] mm: Optimize mprotect() by PTE-batching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a2165422-0d78-4549-bc34-a8bbcdb6513f@arm.com>
Date: Tue, 1 Jul 2025 10:53:08 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Dev Jain <dev.jain@....com>, akpm@...ux-foundation.org, david@...hat.com,
 willy@...radead.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 catalin.marinas@....com, will@...nel.org, Liam.Howlett@...cle.com,
 vbabka@...e.cz, jannh@...gle.com, anshuman.khandual@....com,
 peterx@...hat.com, joey.gouly@....com, ioworker0@...il.com,
 baohua@...nel.org, kevin.brodsky@....com, quic_zhenhuah@...cinc.com,
 christophe.leroy@...roup.eu, yangyicong@...ilicon.com,
 linux-arm-kernel@...ts.infradead.org, hughd@...gle.com,
 yang@...amperecomputing.com, ziy@...dia.com
Subject: Re: [PATCH v4 3/4] mm: Optimize mprotect() by PTE-batching

On 01/07/2025 09:51, Lorenzo Stoakes wrote:
> On Tue, Jul 01, 2025 at 09:30:51AM +0100, Ryan Roberts wrote:
>>>> In an ideal world we would flatten and just have mprotect_folio_pte_batch()
>>>> return the batch size considering all the relevant PTE bits AND the
>>>> AnonExclusive bit on the pages. IIRC one of Dev's earlier versions modified the
>>>> core folio_pte_batch() function to also look at the AnonExclusive bit, but I
>>>> really disliked changing that core function (I think others did too?).
>>>
>>> Yeah let's not change the core function.
>>>
>>> My suggestion is to have mprotect_folio_pte_batch() do this.
>>>
>>>>
>>>> So barring that approach, we are really only left with the batch and sub-batch
>>>> approach - although, yes, it could be abstracted more. We could maintain a
>>>> context struct that persists across all calls to mprotect_folio_pte_batch() and
>>>> it can use that to keep it's state to remember if we are in the middle of a
>>>> sub-batch and decide either to call folio_pte_batch() to get a new batch, or
>>>> call anon_exclusive_batch() to get the next sub-batch within the current batch.
>>>> But that started to feel overly abstracted to me.
>>>
>>> Having this nested batch/sub-batch loop really feels worse. You just get lost in
>>> the complexity here very easily.
>>>
>>> But i"m also not sure we need to maintain _that_ much state?
>>>
>>> We're already looping over all of the PTEs here, so abstracting _the entire
>>> loop_ and all the sub-batch stuff to another function, that is
>>> mprotect_folio_pte_batch() I think sensibly, so it handles this for you makes a
>>> ton of sense.
>>
>> So effectively turn mprotect_folio_pte_batch() into an iterator; have a struct
>> and a funtion to init the struct for the the number of ptes we want to iterate
>> over, then a per iteration function that progressively returns batches?
> 
> Is that really necessary though?
> 
> Idea is that mprotect_folio_pte_batch() returns the nr_ptes _taking into account
> the PAE stuff_.

The issue is the efficiency. Assuming we want to keep the PTE scan contained
within the core folio_pte_batch() function and we _don't_ want to add PAE
awareness to that function, then we have 2 separate, independent loops; one for
PTE scanning and the other for PAE scanning. If the first loop scans through ans
returns 512, but then the PAE scan returns 1, we return 1. If we don't remember
for the next time that we already determined we have a PTE batch of 512 (now
511) then we will rescan the 511 PTEs and potentially return 1 again due to PAE.
Then 510, then 509...

That feels inefficient to me. So I'd much rather just remember that we have a
batch of 512, then split into sub batches as needed for PAE compliance. Then we
only scan each PTE once and each struct page once.

But to achieve this, we either need to merge the 2 loops or we need to carry
state across function calls (i.e. like an iterator).

> 
> Would this break anything?

It's not about breaking anything, it's about scanning efficiently. Perhaps you
don't think it's worth worrying about in practice?

> 
> We might need to pass a flag to say 'don't account for this' for prot numa case.

Yep, another bool ;-)

> 
>>
>> Then we just have a simple loop here that gets the next batch and processes it?