[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+EESO5Phk1W64mNm=YG8E1oNEXENP94cd5FUuq0PhcUsOe7+Q@mail.gmail.com>
Date: Mon, 25 Aug 2025 11:46:02 -0700
From: Lokesh Gidra <lokeshgidra@...gle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Andrew Morton <akpm@...ux-foundation.org>,
Harry Yoo <harry.yoo@...cle.com>, Zi Yan <ziy@...dia.com>, Barry Song <21cnbao@...il.com>,
"open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>, Peter Xu <peterx@...hat.com>,
Suren Baghdasaryan <surenb@...gle.com>, Kalesh Singh <kaleshsingh@...gle.com>,
android-mm <android-mm@...gle.com>, linux-kernel <linux-kernel@...r.kernel.org>,
Jann Horn <jannh@...gle.com>, Rik van Riel <riel@...riel.com>, Vlastimil Babka <vbabka@...e.cz>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>
Subject: Re: [DISCUSSION] Unconditionally lock folios when calling rmap_walk()
On Mon, Aug 25, 2025 at 8:19 AM David Hildenbrand <david@...hat.com> wrote:
>
> On 22.08.25 19:29, Lokesh Gidra wrote:
> > Hi all,
> >
> > Currently, some callers of rmap_walk() conditionally avoid try-locking
> > non-ksm anon folios. This necessitates serialization through anon_vma
> > write-lock elsewhere when folio->mapping and/or folio->index (fields
> > involved in rmap_walk()) are to be updated. This hurts scalability due
> > to coarse granularity of the lock. For instance, when multiple threads
> > invoke userfaultfd’s MOVE ioctl simultaneously to move distinct pages
> > from the same src VMA, they all contend for the corresponding
> > anon_vma’s lock. Field traces for arm64 android devices reveal over
> > 30ms of uninterruptible sleep in the main UI thread, leading to janky
> > user interactions.
> >
> > Among all rmap_walk() callers that don’t lock anon folios,
> > folio_referenced() is the most critical (others are
> > page_idle_clear_pte_refs(), damon_folio_young(), and
> > damon_folio_mkold()). The relevant code in folio_referenced() is:
> >
> > if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> > we_locked = folio_trylock(folio);
> > if (!we_locked)
> > return 1;
> > }
> >
> > It’s unclear why locking anon_vma exclusively (when updating
> > folio->mapping, like in uffd MOVE) is beneficial over walking rmap
> > with folio locked. It’s in the reclaim path, so should not be a
> > critical path that necessitates some special treatment, unless I’m
> > missing something.
> >
> > Therefore, I propose simplifying the locking mechanism by ensuring the
> > folio is locked before calling rmap_walk().
>
> Essentially, what you mean is roughly:
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 34333ae3bd80f..0800e73c0796e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1005,7 +1005,7 @@ int folio_referenced(struct folio *folio, int is_locked,
> if (!folio_raw_mapping(folio))
> return 0;
>
> - if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> + if (!is_locked) {
> we_locked = folio_trylock(folio);
> if (!we_locked)
> return 1;
>
>
> The downside of that change is that ordinary (!ksm) folios will observe being locked
> when we are actually only trying to asses if they were referenced.
>
> Does it matter?
>
> I can only speculate that it might have been very relevant before
> 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive").
>
> Essentially any R/O fault would have resulted in us copying the page, simply because
> there is concurrent folio_referenced() happening.
>
> Before 09854ba94c6a ("mm: do_wp_page() simplification") that wasn't an issue, but
> it would have meant that the write fault would be stuck until folio_referenced()
> would be done, which is also suboptimal.
>
> So likely, avoiding grabbing the folio lock was beneficial.
>
>
> Today, this would only affect R/O pages after fork (PageAnonExclusive not set).
>
>
> Staring at shrink_active_list()->folio_referenced(), we isolate the folio first
> (grabbing reference+clearing LRU), so do_wp_page()->wp_can_reuse_anon_folio()
> would already refuse to reuse immediately, because it would spot a raised reference.
> The folio lock does not make a difference anymore.
>
>
> Is there any other anon-specific (!ksm) folio locking? Nothing exciting comes to mind,
> except maybe some folio splitting or khugepaged that maybe would have to wait.
>
> But khugepaged would already also fail to isolate these folios, so probably it's not that
> relevant anymore ...
Thanks so much for your thorough analysis. Very useful!
For folio splitting, it seems anon_vma lock is acquired exclusively,
so it serializes against folio_referenced() anyways.
>
> --
> Cheers
>
> David / dhildenb
>
Powered by blists - more mailing lists