lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d5d5813-7764-4667-937f-cec52f8b03c9@lucifer.local>
Date: Tue, 27 May 2025 11:45:43 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dev Jain <dev.jain@....com>
Cc: akpm@...ux-foundation.org, Liam.Howlett@...cle.com, vbabka@...e.cz,
        jannh@...gle.com, pfalcato@...e.de, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, david@...hat.com, peterx@...hat.com,
        ryan.roberts@....com, mingo@...nel.org, libang.li@...group.com,
        maobibo@...ngson.cn, zhengqi.arch@...edance.com, baohua@...nel.org,
        anshuman.khandual@....com, willy@...radead.org, ioworker0@...il.com,
        yang@...amperecomputing.com, baolin.wang@...ux.alibaba.com,
        ziy@...dia.com, hughd@...gle.com
Subject: Re: [PATCH v3 2/2] mm: Optimize mremap() by PTE batching

On Tue, May 27, 2025 at 01:20:49PM +0530, Dev Jain wrote:
> Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptes
> are painted with the contig bit, then ptep_get() will iterate through all 16
> entries to collect a/d bits. Hence this optimization will result in a 16x
> reduction in the number of ptep_get() calls. Next, ptep_get_and_clear()
> will eventually call contpte_try_unfold() on every contig block, thus
> flushing the TLB for the complete large folio range. Instead, use
> get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only
> do them on the starting and ending contig block.

But you're also making this applicable to non-contpte cases?

See below, but the commit message shoud clearly point out this is general
for page table split large folios (unless I've missed something of course!
:)

>
> Signed-off-by: Dev Jain <dev.jain@....com>
> ---
>  mm/mremap.c | 40 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 0163e02e5aa8..580b41f8d169 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -170,6 +170,24 @@ static pte_t move_soft_dirty_pte(pte_t pte)
>  	return pte;
>  }
>
> +/* mremap a batch of PTEs mapping the same large folio */

I think this comment is fairly useless, it basically spells out the function
name.

I'd prefer something like 'determine if a PTE contains physically contiguous
entries which map the same large folio'.

> +static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long addr,
> +		pte_t *ptep, pte_t pte, int max_nr)
> +{
> +	const fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
> +	struct folio *folio;
> +
> +	if (max_nr == 1)
> +		return 1;
> +
> +	folio = vm_normal_folio(vma, addr, pte);
> +	if (!folio || !folio_test_large(folio))
> +		return 1;
> +
> +	return folio_pte_batch(folio, addr, ptep, pte, max_nr, flags, NULL,
> +			       NULL, NULL);
> +}

The code is much better however! :)

> +
>  static int move_ptes(struct pagetable_move_control *pmc,
>  		unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd)
>  {
> @@ -177,7 +195,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  	bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
>  	struct mm_struct *mm = vma->vm_mm;
>  	pte_t *old_ptep, *new_ptep;
> -	pte_t pte;
> +	pte_t old_pte, pte;
>  	pmd_t dummy_pmdval;
>  	spinlock_t *old_ptl, *new_ptl;
>  	bool force_flush = false;
> @@ -185,6 +203,8 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  	unsigned long new_addr = pmc->new_addr;
>  	unsigned long old_end = old_addr + extent;
>  	unsigned long len = old_end - old_addr;
> +	int max_nr_ptes;
> +	int nr_ptes;
>  	int err = 0;
>
>  	/*
> @@ -236,12 +256,14 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  	flush_tlb_batched_pending(vma->vm_mm);
>  	arch_enter_lazy_mmu_mode();
>
> -	for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
> -				   new_ptep++, new_addr += PAGE_SIZE) {
> -		if (pte_none(ptep_get(old_ptep)))
> +	for (; old_addr < old_end; old_ptep += nr_ptes, old_addr += nr_ptes * PAGE_SIZE,
> +		new_ptep += nr_ptes, new_addr += nr_ptes * PAGE_SIZE) {
> +		nr_ptes = 1;
> +		max_nr_ptes = (old_end - old_addr) >> PAGE_SHIFT;
> +		old_pte = ptep_get(old_ptep);
> +		if (pte_none(old_pte))
>  			continue;
>
> -		pte = ptep_get_and_clear(mm, old_addr, old_ptep);
>  		/*
>  		 * If we are remapping a valid PTE, make sure
>  		 * to flush TLB before we drop the PTL for the
> @@ -253,8 +275,12 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  		 * the TLB entry for the old mapping has been
>  		 * flushed.
>  		 */
> -		if (pte_present(pte))
> +		if (pte_present(old_pte)) {
> +			nr_ptes = mremap_folio_pte_batch(vma, old_addr, old_ptep,
> +							 old_pte, max_nr_ptes);
>  			force_flush = true;
> +		}
> +		pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0);

Just to clarify, in the previous revision you said:

"Split THPs won't be batched; you can use pte_batch() (from David's refactoring)
and figure the split THP batch out, but then get_and_clear_full_ptes() will be
gathering a/d bits and smearing them across the batch, which will be incorrect."

But... this will be triggered for page table split large folio no?

So is there something wrong here or not?

>  		pte = move_pte(pte, old_addr, new_addr);
>  		pte = move_soft_dirty_pte(pte);
>
> @@ -267,7 +293,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  				else if (is_swap_pte(pte))
>  					pte = pte_swp_clear_uffd_wp(pte);
>  			}
> -			set_pte_at(mm, new_addr, new_ptep, pte);
> +			set_ptes(mm, new_addr, new_ptep, pte, nr_ptes);

The code looks much better here after refactoring, however!

>  		}
>  	}
>
> --
> 2.30.2
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ