[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e54949b6-3635-4578-b98f-08c029d7796c@lucifer.local>
Date: Tue, 1 Jul 2025 09:34:32 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Dev Jain <dev.jain@....com>, akpm@...ux-foundation.org, david@...hat.com,
willy@...radead.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
catalin.marinas@....com, will@...nel.org, Liam.Howlett@...cle.com,
vbabka@...e.cz, jannh@...gle.com, anshuman.khandual@....com,
peterx@...hat.com, joey.gouly@....com, ioworker0@...il.com,
baohua@...nel.org, kevin.brodsky@....com, quic_zhenhuah@...cinc.com,
christophe.leroy@...roup.eu, yangyicong@...ilicon.com,
linux-arm-kernel@...ts.infradead.org, hughd@...gle.com,
yang@...amperecomputing.com, ziy@...dia.com
Subject: Re: [PATCH v4 2/4] mm: Add batched versions of
ptep_modify_prot_start/commit
On Tue, Jul 01, 2025 at 09:23:05AM +0100, Ryan Roberts wrote:
> >>>>> +#ifndef modify_prot_commit_ptes
> >>>>> +static inline void modify_prot_commit_ptes(struct vm_area_struct *vma,
> >>>>> unsigned long addr,
> >>>>> + pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr)
> >>>>> +{
> >>>>> + int i;
> >>>>> +
> >>>>> + for (i = 0; i < nr; ++i) {
> >>>>> + ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte);
> >>>>> + ptep++;
> >>>> Weird place to put this increment, maybe just stick it in the for loop.
> >>>>
> >>>>> + addr += PAGE_SIZE;
> >>>> Same comment here.
> >>>
> >>> Sure.
> >>>
> >>>>
> >>>>> + old_pte = pte_next_pfn(old_pte);
> >>>> Could be:
> >>>>
> >>>> old_pte = pte;
> >>>>
> >>>> No?
> >>>
> >>> We will need to update old_pte also since that
> >>> is used by powerpc in radix__ptep_modify_prot_commit().
> >>
> >> I think perhaps Lorenzo has the model in his head where old_pte is the previous
> >> pte in the batch. That's not the case. old_pte is the value of the pte in the
> >> current position of the batch before any changes were made. pte is the new value
> >> for the pte. So we need to expliticly advance the PFN in both old_pte and pte
> >> each iteration round the loop.
> >
> > Yeah, you're right, apologies, I'd misinterpreted.
> >
> > I really, really, really hate how all this is implemented. This is obviously an
> > mprotect() and legacy thing but it's almost designed for confusion. Not the
> > fault of this series, and todo++ on improving mprotect as a whole (been on my
> > list for a while...)
>
> Agreed. I struggled for a long time with some of the pgtable helper abstractions
> to the arch and all the assumptions they make. But ultimately all Dev is trying
> to do here is make some incremental improvements, following the established
> patterns. Hopefully you agree that cleanups on a larger scale should be reserved
> for a systematic, focussed series.
Totally agree, when I mention my distaste for existing logic, see those as
asides, I'm not asking the series to be blocked until that's fixed :)
I'm happy for us to take Dev's changes obviously once review issues are
resolved.
I think my suggestion below helps get us to a good compromise (modulo
mm's beautifully overloaded/confusing use of terminology :>)
>
> >
> > So we're ultimately updating ptep (this thing that we update, of course, is
> > buried in the middle of the function invocation) in:
> >
> > ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte);
> >
> > We are setting *ptep++ = pte essentially (roughly speaking) right?
>
> Yeah, pretty much.
>
> The API was originally created for Xen IIRC. The problem is that the HW can
> update the A/D bits asynchronously if the PTE is valid (from the HW perspective)
> so the previous approach was to get_and_clear (atomic), modify, write. But that
> required 2 Xen hypervisor calls per PTE. This start/commit approach allows Xen
> to both avoid the get_and_clear() and batch the writes for all PTEs in a lazy
> mmu batch. So hypervisor calls are reduced from 2 per PTE to 1 per lazy mmu
> batch. TBH I'm no Xen expert; some of those details may be off, but big picture
> is correct.
Yeah, here we go again with some horror show stuff on Xen's behalf. I've played
Half Life, so I already know to fear Xen ;)
I believe David has _thoughts_ on this :)
Again this is aside stuff.
>
> Anyway, arm64 doesn't care about any of that, but it does override
> ptep_modify_prot_start() / ptep_modify_prot_commit() to implement an erratum
> workaround. And it can benefit substantially from batching.
Ack. And of course, the batching part is why we're all here!
>
> >
> > And the arch needs to know about any bits that have changed I guess hence
> > providing old_pte as well right?
> >
> > OK so yeah, I get it now, we're not actually advancing through ptes here, we're
> > just advancing the PFN and applying the same 'template'.
> >
> > How about something like:
> >
> > static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long addr,
> > pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr)
> > {
> > int i;
> >
> > for (i = 0; i < nr; i++, ptep++, addr += PAGE_SIZE) {
> > ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte);
> >
> > /* Advance PFN only, set same flags. */
> > old_pte = pte_next_pfn(old_pte);
> > pte = pte_next_pfn(pte);
> > }
> > }
> >
> > Neatens it up a bit and makes it clear that we're effectively propagating the
> > flags here.
>
> Yes, except we don't usually refer to the non-pfn parts of a pte as "flags". We
> normally call them pgprot or prot. God knows why...
Ah of course we love to do this kind of thing to oursevles :>)
Dev - suggestion above then but s/flags/prot/.
Powered by blists - more mailing lists