lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c15f911-cd4c-4ad3-97b7-e7153679f15e@lucifer.local>
Date: Mon, 27 Oct 2025 19:51:14 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Zhang Qilong <zhangqilong3@...wei.com>, akpm@...ux-foundation.org,
        Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org,
        surenb@...gle.com, mhocko@...e.com, jannh@...gle.com, pfalcato@...e.de,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        wangkefeng.wang@...wei.com, sunnanyong@...wei.com,
        Dev Jain <dev.jain@....com>, Ryan Roberts <ryan.roberts@....com>
Subject: Re: [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs
 batch optimization.

+Dev, Ryan

Please ensure to keep Dev + Ryan in the loop on all future iterations of this.

On Mon, Oct 27, 2025 at 08:24:40PM +0100, David Hildenbrand wrote:
> On 27.10.25 15:03, Zhang Qilong wrote:
> > Currently, the PTEs batch requires folio access, with the maximum
> > quantity limited to the PFNs contained within the folio. However,
> > in certain case (such as mremap_folio_pte_batch and mincore_pte_range),
> > accessing the folio is unnecessary and expensive.
> >
> > For scenarios that do not require folio access, this patch introduces
> > can_pte_batch_count(). With contiguous physical addresses and identical
> > PTE attribut bits, we can now process more page table entries at once,
> > in batch, not just limited to entries mapped within a single folio. On
> > the other hand, it avoid the folio access.
> >
> > Signed-off-by: Zhang Qilong <zhangqilong3@...wei.com>
> > ---
> >   mm/internal.h | 76 +++++++++++++++++++++++++++++++++++++++------------
> >   1 file changed, 58 insertions(+), 18 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 1561fc2ff5b8..92034ca9092d 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -233,61 +233,62 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
> >   		pte = pte_wrprotect(pte);
> >   	return pte_mkold(pte);
> >   }
> >   /**
> > - * folio_pte_batch_flags - detect a PTE batch for a large folio
> > - * @folio: The large folio to detect a PTE batch for.
> > + * can_pte_batch_count - detect a PTE batch in range [ptep, to ptep + max_nr)
>
> I really don't like the name.
>
> Maybe it's just pte_batch().

Yeah the name's terrible.

But I'm iffy about this series as a whole.

'can' implies boolean, it should be something like get pte batch or count pte
batch or something like this. It's silly to partially replace other functions
also.

But I've doubtful as to whether any of this will work...

>
> >    * @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NULL.
> >    * @ptep: Page table pointer for the first entry.
> >    * @ptentp: Pointer to a COPY of the first page table entry whose flags this
> >    *	    function updates based on @flags if appropriate.
> >    * @max_nr: The maximum number of table entries to consider.
> >    * @flags: Flags to modify the PTE batch semantics.
> >    *
> > - * Detect a PTE batch: consecutive (present) PTEs that map consecutive
> > - * pages of the same large folio in a single VMA and a single page table.
> > + * This interface is designed for this case that do not require folio access.
> > + * If folio consideration is needed, please call folio_pte_batch_flags instead.

I'm pretty certain we need to make sure we do not cross folio boundaries, which
kills this series if so does it not?

Ryan - can you confirm?

> > + *
> > + * Detect a PTE batch: consecutive (present) PTEs that map consecutive pages
> > + * in a single VMA and a single page table.
> >    *
> >    * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
> >    * the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is set)
> >    * and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set).
> >    *
> > - * @ptep must map any page of the folio. max_nr must be at least one and
> > + * @ptep point to the first entry in range, max_nr must be at least one and
> >    * must be limited by the caller so scanning cannot exceed a single VMA and
> >    * a single page table.
> >    *
> >    * Depending on the FPB_MERGE_* flags, the pte stored at @ptentp will
> >    * be updated: it's crucial that a pointer to a COPY of the first
> >    * page table entry, obtained through ptep_get(), is provided as @ptentp.
> >    *
> > - * This function will be inlined to optimize based on the input parameters;
> > - * consider using folio_pte_batch() instead if applicable.
> > + * The following folio_pte_batch_flags() deal with PTEs that mapped in a
> > + * single folio. However can_pte_batch_count has the capability to handle
> > + * PTEs that mapped in consecutive folios. If flags is not set, it will ignore
> > + * the accessed, writable and dirty bits. Once the flags is set, the respect
> > + * bit(s) will be compared in pte_same(), if the advanced pte_batch_hint()
> > + * respect pte bit is different, pte_same() will return false and break. This
> > + * ensures the correctness of handling multiple folio PTEs.
> > + *
> > + * This function will be inlined to optimize based on the input parameters.
> >    *
> >    * Return: the number of table entries in the batch.
> >    */
>
> I recall trouble if we try batching across folios:

Yup pretty sure Ryan said we don't/can't in a previous thread. Now cc'd...

>
> commit 7b08b74f3d99f6b801250683c751d391128799ec (tag: mm-hotfixes-stable-2025-05-10-14-23)
> Author: Petr Vaněk <arkamar@...as.cz>
> Date:   Fri May 2 23:50:19 2025 +0200
>
>     mm: fix folio_pte_batch() on XEN PV
>     On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
>     folio due to a corner case in pte_advance_pfn().  Specifically, when the
>     PFN following the folio maps to an invalidated MFN,
>             expected_pte = pte_advance_pfn(expected_pte, nr);
>     produces a pte_none().  If the actual next PTE in memory is also
>     pte_none(), the pte_same() succeeds,
>             if (!pte_same(pte, expected_pte))
>                     break;
>     the loop is not broken, and batching continues into unrelated memory.
> ...
>
>
> --
> Cheers
>
> David / dhildenb
>

Thanks, Lorenzk

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ