lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c6e61fa-2437-4c99-b1d3-97af5e2b37a3@lucifer.local>
Date: Tue, 6 May 2025 14:49:04 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dev Jain <dev.jain@....com>
Cc: akpm@...ux-foundation.org, Liam.Howlett@...cle.com, vbabka@...e.cz,
        jannh@...gle.com, pfalcato@...e.de, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, david@...hat.com, peterx@...hat.com,
        ryan.roberts@....com, mingo@...nel.org, libang.li@...group.com,
        maobibo@...ngson.cn, zhengqi.arch@...edance.com, baohua@...nel.org,
        anshuman.khandual@....com, willy@...radead.org, ioworker0@...il.com,
        yang@...amperecomputing.com
Subject: Re: [PATCH 3/3] mm: Optimize mremap() by PTE batching

On Tue, May 06, 2025 at 10:30:56AM +0530, Dev Jain wrote:
> Use folio_pte_batch() to optimize move_ptes(). Use get_and_clear_full_ptes()
> so as to elide TLBIs on each contig block, which was previously done by
> ptep_get_and_clear().

No mention of large folios

>
> Signed-off-by: Dev Jain <dev.jain@....com>
> ---
>  mm/mremap.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 1a08a7c3b92f..3621c07d8eea 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -176,7 +176,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  	struct vm_area_struct *vma = pmc->old;
>  	bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
>  	struct mm_struct *mm = vma->vm_mm;
> -	pte_t *old_ptep, *new_ptep, pte;
> +	pte_t *old_ptep, *new_ptep, old_pte, pte;

Obviously given previous comment you know what I'm going to say here :) let's
put old_pte, pte in a new decl.

>  	pmd_t dummy_pmdval;
>  	spinlock_t *old_ptl, *new_ptl;
>  	bool force_flush = false;
> @@ -185,6 +185,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  	unsigned long old_end = old_addr + extent;
>  	unsigned long len = old_end - old_addr;
>  	int err = 0;
> +	int nr;
>
>  	/*
>  	 * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma
> @@ -237,10 +238,14 @@ static int move_ptes(struct pagetable_move_control *pmc,
>
>  	for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
>  				   new_ptep++, new_addr += PAGE_SIZE) {

Hm this just seems wrong, even if we're dealing with a large folio we're still
offsetting by PAGE_SIZE each time and iterating through further sub-pages?

Shouldn't we be doing something like += nr and += PAGE_SIZE * nr?

Then it'd make sense to initialise nr to 1.

Honestly I'd prefer us though to refactor move_ptes() to something like:

	for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
				   new_ptep++, new_addr += PAGE_SIZE) {
		pte_t old_pte = ptep_get(old_ptep);

		if (pte_none(old_pte))
			continue;

		move_pte(pmc, vma, old_ptep, old_pte);
	}

Declaring this new move_pte() where you can put the rest of the stuff.

I'd much rather we do this than add to the mess as-is.



> -		if (pte_none(ptep_get(old_ptep)))
> +		const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
> +		int max_nr = (old_end - old_addr) >> PAGE_SHIFT;
> +
> +		nr = 1;
> +		old_pte = ptep_get(old_ptep);

You can declare this in the for loop, no need for us to contaminate whole
function scope with it.

Same with 'nr' in this implementation (though I'd rather you changed it up, see
above).

> +		if (pte_none(old_pte))
>  			continue;
>
> -		pte = ptep_get_and_clear(mm, old_addr, old_ptep);
>  		/*
>  		 * If we are remapping a valid PTE, make sure
>  		 * to flush TLB before we drop the PTL for the
> @@ -252,8 +257,17 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  		 * the TLB entry for the old mapping has been
>  		 * flushed.
>  		 */
> -		if (pte_present(pte))
> +		if (pte_present(old_pte)) {
> +			if ((max_nr != 1) && maybe_contiguous_pte_pfns(old_ptep, old_pte)) {
> +				struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);
> +
> +				if (folio && folio_test_large(folio))
> +					nr = folio_pte_batch(folio, old_addr, old_ptep,
> +					old_pte, max_nr, fpb_flags, NULL, NULL, NULL);

Indentation seems completely broken here? I also hate that we're nesting to this
degree? Can we please find a way not to?

This function is already a bit of a clogged mess, I'd rather we clean up then
add to it.

(See above again :)


> +			}
>  			force_flush = true;
> +		}
> +		pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr, 0);
>  		pte = move_pte(pte, old_addr, new_addr);
>  		pte = move_soft_dirty_pte(pte);
>
> @@ -266,7 +280,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  				else if (is_swap_pte(pte))
>  					pte = pte_swp_clear_uffd_wp(pte);
>  			}
> -			set_pte_at(mm, new_addr, new_ptep, pte);
> +			set_ptes(mm, new_addr, new_ptep, pte, nr);
>  		}
>  	}
>
> --
> 2.30.2
>

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ