[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <276b70aa-9853-40cc-8e7d-e790166706b5@redhat.com>
Date: Mon, 27 Oct 2025 20:24:40 +0100
From: David Hildenbrand <david@...hat.com>
To: Zhang Qilong <zhangqilong3@...wei.com>, akpm@...ux-foundation.org,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, jannh@...gle.com,
pfalcato@...e.de
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
wangkefeng.wang@...wei.com, sunnanyong@...wei.com
Subject: Re: [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs
batch optimization.
On 27.10.25 15:03, Zhang Qilong wrote:
> Currently, the PTEs batch requires folio access, with the maximum
> quantity limited to the PFNs contained within the folio. However,
> in certain case (such as mremap_folio_pte_batch and mincore_pte_range),
> accessing the folio is unnecessary and expensive.
>
> For scenarios that do not require folio access, this patch introduces
> can_pte_batch_count(). With contiguous physical addresses and identical
> PTE attribut bits, we can now process more page table entries at once,
> in batch, not just limited to entries mapped within a single folio. On
> the other hand, it avoid the folio access.
>
> Signed-off-by: Zhang Qilong <zhangqilong3@...wei.com>
> ---
> mm/internal.h | 76 +++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 58 insertions(+), 18 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..92034ca9092d 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -233,61 +233,62 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
> pte = pte_wrprotect(pte);
> return pte_mkold(pte);
> }
>
> /**
> - * folio_pte_batch_flags - detect a PTE batch for a large folio
> - * @folio: The large folio to detect a PTE batch for.
> + * can_pte_batch_count - detect a PTE batch in range [ptep, to ptep + max_nr)
I really don't like the name.
Maybe it's just pte_batch().
> * @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NULL.
> * @ptep: Page table pointer for the first entry.
> * @ptentp: Pointer to a COPY of the first page table entry whose flags this
> * function updates based on @flags if appropriate.
> * @max_nr: The maximum number of table entries to consider.
> * @flags: Flags to modify the PTE batch semantics.
> *
> - * Detect a PTE batch: consecutive (present) PTEs that map consecutive
> - * pages of the same large folio in a single VMA and a single page table.
> + * This interface is designed for this case that do not require folio access.
> + * If folio consideration is needed, please call folio_pte_batch_flags instead.
> + *
> + * Detect a PTE batch: consecutive (present) PTEs that map consecutive pages
> + * in a single VMA and a single page table.
> *
> * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
> * the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is set)
> * and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set).
> *
> - * @ptep must map any page of the folio. max_nr must be at least one and
> + * @ptep point to the first entry in range, max_nr must be at least one and
> * must be limited by the caller so scanning cannot exceed a single VMA and
> * a single page table.
> *
> * Depending on the FPB_MERGE_* flags, the pte stored at @ptentp will
> * be updated: it's crucial that a pointer to a COPY of the first
> * page table entry, obtained through ptep_get(), is provided as @ptentp.
> *
> - * This function will be inlined to optimize based on the input parameters;
> - * consider using folio_pte_batch() instead if applicable.
> + * The following folio_pte_batch_flags() deal with PTEs that mapped in a
> + * single folio. However can_pte_batch_count has the capability to handle
> + * PTEs that mapped in consecutive folios. If flags is not set, it will ignore
> + * the accessed, writable and dirty bits. Once the flags is set, the respect
> + * bit(s) will be compared in pte_same(), if the advanced pte_batch_hint()
> + * respect pte bit is different, pte_same() will return false and break. This
> + * ensures the correctness of handling multiple folio PTEs.
> + *
> + * This function will be inlined to optimize based on the input parameters.
> *
> * Return: the number of table entries in the batch.
> */
I recall trouble if we try batching across folios:
commit 7b08b74f3d99f6b801250683c751d391128799ec (tag: mm-hotfixes-stable-2025-05-10-14-23)
Author: Petr Vaněk <arkamar@...as.cz>
Date: Fri May 2 23:50:19 2025 +0200
mm: fix folio_pte_batch() on XEN PV
On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
folio due to a corner case in pte_advance_pfn(). Specifically, when the
PFN following the folio maps to an invalidated MFN,
expected_pte = pte_advance_pfn(expected_pte, nr);
produces a pte_none(). If the actual next PTE in memory is also
pte_none(), the pte_same() succeeds,
if (!pte_same(pte, expected_pte))
break;
the loop is not broken, and batching continues into unrelated memory.
...
--
Cheers
David / dhildenb
Powered by blists - more mailing lists