linux-kernel - Re: [PATCH v1] mm: convert folio_estimated_sharers() to folio_likely_mapped

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <661e519bc7753d784449931876a61f34bc8ad6ca.camel@oracle.com>
Date: Tue, 27 Feb 2024 16:18:59 -0700
From: Khalid Aziz <gonehacking@...il.com>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Barry Song <v-songbaohua@...o.com>, Vishal Moola
	 <vishal.moola@...il.com>, Ryan Roberts <ryan.roberts@....com>
Subject: Re: [PATCH v1] mm: convert folio_estimated_sharers() to
 folio_likely_mapped_shared()

On Tue, 2024-02-27 at 21:15 +0100, David Hildenbrand wrote:
> Callers of folio_estimated_sharers() only care about "mapped shared
> vs.
> mapped exclusively", not the exact estimate of sharers. Let's
> consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
> 
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
> 
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be
> able
> to improve our estimate for partially-mappable folios, though.
> 
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first
> subpage
> is not mapped, we will consistently detect it as "mapped
> exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large
> folios: if
> the first page was already unmapped, we would have skipped the folio.
> 
> Cc: Barry Song <v-songbaohua@...o.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@...il.com>
> Cc: Ryan Roberts <ryan.roberts@....com>
> Signed-off-by: David Hildenbrand <david@...hat.com>


This patch adds clarity while retaining current behavior, so looks good
to me.

Reviewed-by: Khalid Aziz <khalid.aziz@...cle.com>


> ---
>  include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++--------
> --
>  mm/huge_memory.c   |  2 +-
>  mm/madvise.c       |  6 +++---
>  mm/memory.c        |  2 +-
>  mm/mempolicy.c     | 14 ++++++--------
>  mm/migrate.c       |  8 ++++----
>  6 files changed, 51 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f4825d829656..795c89632265f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio
> *folio)
>  }
>  
>  /**
> - * folio_estimated_sharers - Estimate the number of sharers of a
> folio.
> + * folio_likely_mapped_shared - Estimate if the folio is mapped into
> the page
> + *				tables of more than one MM
>   * @folio: The folio.
>   *
> - * folio_estimated_sharers() aims to serve as a function to
> efficiently
> - * estimate the number of processes sharing a folio. This is done by
> - * looking at the precise mapcount of the first subpage in the
> folio, and
> - * assuming the other subpages are the same. This may not be true
> for large
> - * folios. If you want exact mapcounts for exact calculations, look
> at
> - * page_mapcount() or folio_total_mapcount().
> + * This function checks if the folio is currently mapped into more
> than one
> + * MM ("mapped shared"), or if the folio is only mapped into a
> single MM
> + * ("mapped exclusively").
>   *
> - * Return: The estimated number of processes sharing a folio.
> + * As precise information is not easily available for all folios,
> this function
> + * estimates the number of MMs ("sharers") that are currently
> mapping a folio
> + * using the number of times the first page of the folio is
> currently mapped
> + * into page tables.
> + *
> + * For small anonymous folios (except KSM folios) and anonymous
> hugetlb folios,
> + * the return value will be exactly correct, because they can only
> be mapped
> + * at most once into an MM, and they cannot be partially mapped.
> + *
> + * For other folios, the result can be fuzzy:
> + * (a) For partially-mappable large folios (THP), the return value
> can wrongly
> + *     indicate "mapped exclusively" (false negative) when the folio
> is
> + *     only partially mapped into at least one MM.
> + * (b) For pagecache folios (including hugetlb), the return value
> can wrongly
> + *     indicate "mapped shared" (false positive) when two VMAs in
> the same MM
> + *     cover the same file range.
> + * (c) For (small) KSM folios, the return value can wrongly indicate
> "mapped
> + *     shared" (false negative), when the folio is mapped multiple
> times into
> + *     the same MM.
> + *
> + * Further, this function only considers current page table mappings
> that
> + * are tracked using the folio mapcount(s). It does not consider:
> + * (1) If the folio might get mapped in the (near) future (e.g.,
> swapcache,
> + *     pagecache, temporary unmapping for migration).
> + * (2) If the folio is mapped differently (VM_PFNMAP).
> + * (3) If hugetlb page table sharing applies. Callers might want to
> check
> + *     hugetlb_pmd_shared().
> + *
> + * Return: Whether the folio is estimated to be mapped into more
> than one MM.
>   */
> -static inline int folio_estimated_sharers(struct folio *folio)
> +static inline bool folio_likely_mapped_shared(struct folio *folio)
>  {
> -	return page_mapcount(folio_page(folio, 0));
> +	return page_mapcount(folio_page(folio, 0)) > 1;
>  }
>  
>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 50d146eb248ff..4d10904fef70c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather
> *tlb, struct vm_area_struct *vma,
>  	 * If other processes are mapping this folio, we couldn't
> discard
>  	 * the folio unless they all do MADV_FREE so let's skip the
> folio.
>  	 */
> -	if (folio_estimated_sharers(folio) != 1)
> +	if (folio_likely_mapped_shared(folio))
>  		goto out;
>  
>  	if (!folio_trylock(folio))
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 44a498c94158c..32a534d200219 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -366,7 +366,7 @@ static int
> madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		folio = pfn_folio(pmd_pfn(orig_pmd));
>  
>  		/* Do not interfere with other mappings of this
> folio */
> -		if (folio_estimated_sharers(folio) != 1)
> +		if (folio_likely_mapped_shared(folio))
>  			goto huge_unlock;
>  
>  		if (pageout_anon_only_filter &&
> !folio_test_anon(folio))
> @@ -453,7 +453,7 @@ static int
> madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		if (folio_test_large(folio)) {
>  			int err;
>  
> -			if (folio_estimated_sharers(folio) > 1)
> +			if (folio_likely_mapped_shared(folio))
>  				break;
>  			if (pageout_anon_only_filter &&
> !folio_test_anon(folio))
>  				break;
> @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd,
> unsigned long addr,
>  		if (folio_test_large(folio)) {
>  			int err;
>  
> -			if (folio_estimated_sharers(folio) != 1)
> +			if (folio_likely_mapped_shared(folio))
>  				break;
>  			if (!folio_trylock(folio))
>  				break;
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c45b6a42a1b9..8394a9843ca06 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault
> *vmf)
>  	 * Flag if the folio is shared between multiple address
> spaces. This
>  	 * is later used when determining whether to group tasks
> together
>  	 */
> -	if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags &
> VM_SHARED))
> +	if (folio_likely_mapped_shared(folio) && (vma->vm_flags &
> VM_SHARED))
>  		flags |= TNF_SHARED;
>  
>  	nid = folio_nid(folio);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f60b4c99f1302..0b92fde395182 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte,
> unsigned long hmask,
>  	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a
> shared folio.
>  	 * Choosing not to migrate a shared folio is not counted as
> a failure.
>  	 *
> -	 * To check if the folio is shared, ideally we want to make
> sure
> -	 * every page is mapped to the same process. Doing that is
> very
> -	 * expensive, so check the estimated sharers of the folio
> instead.
> +	 * See folio_likely_mapped_shared() on possible imprecision
> when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
>  	if ((flags & MPOL_MF_MOVE_ALL) ||
> -	    (folio_estimated_sharers(folio) == 1 &&
> !hugetlb_pmd_shared(pte)))
> +	    (!folio_likely_mapped_shared(folio) &&
> !hugetlb_pmd_shared(pte)))
>  		if (!isolate_hugetlb(folio, qp->pagelist))
>  			qp->nr_failed++;
>  unlock:
> @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio
> *folio, struct list_head *foliolist,
>  	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a
> shared folio.
>  	 * Choosing not to migrate a shared folio is not counted as
> a failure.
>  	 *
> -	 * To check if the folio is shared, ideally we want to make
> sure
> -	 * every page is mapped to the same process. Doing that is
> very
> -	 * expensive, so check the estimated sharers of the folio
> instead.
> +	 * See folio_likely_mapped_shared() on possible imprecision
> when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
> -	if ((flags & MPOL_MF_MOVE_ALL) ||
> folio_estimated_sharers(folio) == 1) {
> +	if ((flags & MPOL_MF_MOVE_ALL) ||
> !folio_likely_mapped_shared(folio)) {
>  		if (folio_isolate_lru(folio)) {
>  			list_add_tail(&folio->lru, foliolist);
>  			node_stat_mod_folio(folio,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f13..35d376969f8b9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio
> *folio, struct vm_area_struct *vma,
>  	/*
>  	 * Don't migrate file folios that are mapped in multiple
> processes
>  	 * with execute permissions as they are probably shared
> libraries.
> -	 * To check if the folio is shared, ideally we want to make
> sure
> -	 * every page is mapped to the same process. Doing that is
> very
> -	 * expensive, so check the estimated mapcount of the folio
> instead.
> +	 *
> +	 * See folio_likely_mapped_shared() on possible imprecision
> when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
> -	if (folio_estimated_sharers(folio) != 1 &&
> folio_is_file_lru(folio) &&
> +	if (folio_likely_mapped_shared(folio) &&
> folio_is_file_lru(folio) &&
>  	    (vma->vm_flags & VM_EXEC))
>  		goto out;
>