linux-kernel - Re: [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <429481ef-6527-40f5-b7a0-c9370fd1e374@lucifer.local>
Date: Thu, 18 Sep 2025 14:10:05 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: kirill@...temov.name
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, Hugh Dickins <hughd@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Shakeel Butt <shakeel.butt@...ux.dev>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Kiryl Shutsemau <kas@...nel.org>
Subject: Re: [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios

On Thu, Sep 18, 2025 at 12:21:57PM +0100, kirill@...temov.name wrote:
> From: Kiryl Shutsemau <kas@...nel.org>
>
> The kernel currently does not mlock large folios when adding them to
> rmap, stating that it is difficult to confirm that the folio is fully
> mapped and safe to mlock it. However, nowadays the caller passes a
> number of pages of the folio that are getting mapped, making it easy to
> check if the entire folio is mapped to the VMA.
>
> mlock the folio on rmap if it is fully mapped to the VMA.
>
> Signed-off-by: Kiryl Shutsemau <kas@...nel.org>

The logic looks good to me, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>

But note the comments below.

> ---
>  mm/rmap.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 568198e9efc2..ca8d4ef42c2d 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1478,13 +1478,8 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
>  				 PageAnonExclusive(cur_page), folio);
>  	}
>
> -	/*
> -	 * For large folio, only mlock it if it's fully mapped to VMA. It's
> -	 * not easy to check whether the large folio is fully mapped to VMA
> -	 * here. Only mlock normal 4K folio and leave page reclaim to handle
> -	 * large folio.
> -	 */
> -	if (!folio_test_large(folio))
> +	/* Only mlock it if the folio is fully mapped to the VMA */
> +	if (folio_nr_pages(folio) == nr_pages)

OK this is nice, as partially mapped will have folio_nr_pages() != nr_pages. So
logically this must be correct.

>  		mlock_vma_folio(folio, vma);
>  }
>
> @@ -1620,8 +1615,8 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
>  	nr = __folio_add_rmap(folio, page, nr_pages, vma, level, &nr_pmdmapped);
>  	__folio_mod_stat(folio, nr, nr_pmdmapped);
>
> -	/* See comments in folio_add_anon_rmap_*() */
> -	if (!folio_test_large(folio))
> +	/* Only mlock it if the folio is fully mapped to the VMA */
> +	if (folio_nr_pages(folio) == nr_pages)
>  		mlock_vma_folio(folio, vma);
>  }
>
> --
> 2.50.1
>

I see in try_to_unmap_one():

		if (!(flags & TTU_IGNORE_MLOCK) &&
		    (vma->vm_flags & VM_LOCKED)) {
			/* Restore the mlock which got missed */
			if (!folio_test_large(folio))
				mlock_vma_folio(folio, vma);

Do we care about this?

It seems like folio_referenced_one() does some similar logic:

		if (vma->vm_flags & VM_LOCKED) {
			if (!folio_test_large(folio) || !pvmw.pte) {
				/* Restore the mlock which got missed */
				mlock_vma_folio(folio, vma);
				page_vma_mapped_walk_done(&pvmw);
				pra->vm_flags |= VM_LOCKED;
				return false; /* To break the loop */
			}

...

	if ((vma->vm_flags & VM_LOCKED) &&
			folio_test_large(folio) &&
			folio_within_vma(folio, vma)) {
		unsigned long s_align, e_align;

		s_align = ALIGN_DOWN(start, PMD_SIZE);
		e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);

		/* folio doesn't cross page table boundary and fully mapped */
		if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
			/* Restore the mlock which got missed */
			mlock_vma_folio(folio, vma);
			pra->vm_flags |= VM_LOCKED;
			return false; /* To break the loop */
		}
	}

So maybe we could do something similar in try_to_unmap_one()?