[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ndryzvkmrfidmjgj4tl27hk2kmspmb42mxl2smuwgmp5hyedzh@thggle3dhp5j>
Date: Thu, 18 Sep 2025 14:48:20 +0100
From: Kiryl Shutsemau <kirill@...temov.name>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Yin Fengwei <fengwei.yin@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, Hugh Dickins <hughd@...gle.com>,
Matthew Wilcox <willy@...radead.org>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, Rik van Riel <riel@...riel.com>,
Harry Yoo <harry.yoo@...cle.com>, Johannes Weiner <hannes@...xchg.org>,
Shakeel Butt <shakeel.butt@...ux.dev>, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios
On Thu, Sep 18, 2025 at 02:10:05PM +0100, Lorenzo Stoakes wrote:
> On Thu, Sep 18, 2025 at 12:21:57PM +0100, kirill@...temov.name wrote:
> > From: Kiryl Shutsemau <kas@...nel.org>
> >
> > The kernel currently does not mlock large folios when adding them to
> > rmap, stating that it is difficult to confirm that the folio is fully
> > mapped and safe to mlock it. However, nowadays the caller passes a
> > number of pages of the folio that are getting mapped, making it easy to
> > check if the entire folio is mapped to the VMA.
> >
> > mlock the folio on rmap if it is fully mapped to the VMA.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@...nel.org>
>
> The logic looks good to me, so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> But note the comments below.
>
> > ---
> > mm/rmap.c | 13 ++++---------
> > 1 file changed, 4 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 568198e9efc2..ca8d4ef42c2d 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1478,13 +1478,8 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> > PageAnonExclusive(cur_page), folio);
> > }
> >
> > - /*
> > - * For large folio, only mlock it if it's fully mapped to VMA. It's
> > - * not easy to check whether the large folio is fully mapped to VMA
> > - * here. Only mlock normal 4K folio and leave page reclaim to handle
> > - * large folio.
> > - */
> > - if (!folio_test_large(folio))
> > + /* Only mlock it if the folio is fully mapped to the VMA */
> > + if (folio_nr_pages(folio) == nr_pages)
>
> OK this is nice, as partially mapped will have folio_nr_pages() != nr_pages. So
> logically this must be correct.
>
> > mlock_vma_folio(folio, vma);
> > }
> >
> > @@ -1620,8 +1615,8 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
> > nr = __folio_add_rmap(folio, page, nr_pages, vma, level, &nr_pmdmapped);
> > __folio_mod_stat(folio, nr, nr_pmdmapped);
> >
> > - /* See comments in folio_add_anon_rmap_*() */
> > - if (!folio_test_large(folio))
> > + /* Only mlock it if the folio is fully mapped to the VMA */
> > + if (folio_nr_pages(folio) == nr_pages)
> > mlock_vma_folio(folio, vma);
> > }
> >
> > --
> > 2.50.1
> >
>
> I see in try_to_unmap_one():
>
> if (!(flags & TTU_IGNORE_MLOCK) &&
> (vma->vm_flags & VM_LOCKED)) {
> /* Restore the mlock which got missed */
> if (!folio_test_large(folio))
> mlock_vma_folio(folio, vma);
>
> Do we care about this?
>
> It seems like folio_referenced_one() does some similar logic:
>
> if (vma->vm_flags & VM_LOCKED) {
> if (!folio_test_large(folio) || !pvmw.pte) {
> /* Restore the mlock which got missed */
> mlock_vma_folio(folio, vma);
> page_vma_mapped_walk_done(&pvmw);
> pra->vm_flags |= VM_LOCKED;
> return false; /* To break the loop */
> }
>
> ...
>
> if ((vma->vm_flags & VM_LOCKED) &&
> folio_test_large(folio) &&
> folio_within_vma(folio, vma)) {
> unsigned long s_align, e_align;
>
> s_align = ALIGN_DOWN(start, PMD_SIZE);
> e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
>
> /* folio doesn't cross page table boundary and fully mapped */
> if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
> /* Restore the mlock which got missed */
> mlock_vma_folio(folio, vma);
> pra->vm_flags |= VM_LOCKED;
> return false; /* To break the loop */
> }
> }
>
> So maybe we could do something similar in try_to_unmap_one()?
Hm. This seems to be buggy to me.
mlock_vma_folio() has to be called with ptl taken, no? It gets dropped
by this place.
+Fengwei.
I think this has to be handled inside the loop once ptes reaches
folio_nr_pages(folio).
Maybe something like this (untested):
diff --git a/mm/rmap.c b/mm/rmap.c
index ca8d4ef42c2d..719f1c99470c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -858,17 +858,13 @@ static bool folio_referenced_one(struct folio *folio,
address = pvmw.address;
if (vma->vm_flags & VM_LOCKED) {
- if (!folio_test_large(folio) || !pvmw.pte) {
- /* Restore the mlock which got missed */
- mlock_vma_folio(folio, vma);
- page_vma_mapped_walk_done(&pvmw);
- pra->vm_flags |= VM_LOCKED;
- return false; /* To break the loop */
- }
+ unsigned long s_align, e_align;
+
+ /* Small folio or PMD-mapped large folio */
+ if (!folio_test_large(folio) || !pvmw.pte)
+ goto restore_mlock;
+
/*
- * For large folio fully mapped to VMA, will
- * be handled after the pvmw loop.
- *
* For large folio cross VMA boundaries, it's
* expected to be picked by page reclaim. But
* should skip reference of pages which are in
@@ -878,7 +874,23 @@ static bool folio_referenced_one(struct folio *folio,
*/
ptes++;
pra->mapcount--;
- continue;
+
+ /* Folio must be fully mapped to be mlocked */
+ if (ptes != folio_nr_pages(folio))
+ continue;
+
+ s_align = ALIGN_DOWN(start, PMD_SIZE);
+ e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
+
+ /* folio doesn't cross page table */
+ if (s_align != e_align)
+ continue;
+restore_mlock:
+ /* Restore the mlock which got missed */
+ mlock_vma_folio(folio, vma);
+ page_vma_mapped_walk_done(&pvmw);
+ pra->vm_flags |= VM_LOCKED;
+ return false; /* To break the loop */
}
/*
@@ -914,23 +926,6 @@ static bool folio_referenced_one(struct folio *folio,
pra->mapcount--;
}
- if ((vma->vm_flags & VM_LOCKED) &&
- folio_test_large(folio) &&
- folio_within_vma(folio, vma)) {
- unsigned long s_align, e_align;
-
- s_align = ALIGN_DOWN(start, PMD_SIZE);
- e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
-
- /* folio doesn't cross page table boundary and fully mapped */
- if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
- /* Restore the mlock which got missed */
- mlock_vma_folio(folio, vma);
- pra->vm_flags |= VM_LOCKED;
- return false; /* To break the loop */
- }
- }
-
if (referenced)
folio_clear_idle(folio);
if (folio_test_clear_young(folio))
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists