linux-kernel - Re: [RFC] Unconditionally lock folios when calling rmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+EESO7_-64GU5v1FTMXbemQixPX+xo6SGm8r0txohZJLs97cA@mail.gmail.com>
Date: Thu, 21 Aug 2025 10:56:02 -0700
From: Lokesh Gidra <lokeshgidra@...gle.com>
To: Zi Yan <ziy@...dia.com>
Cc: Barry Song <21cnbao@...il.com>, "open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>, Peter Xu <peterx@...hat.com>, 
	David Hildenbrand <david@...hat.com>, Suren Baghdasaryan <surenb@...gle.com>, 
	Kalesh Singh <kaleshsingh@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	android-mm <android-mm@...gle.com>, linux-kernel <linux-kernel@...r.kernel.org>, 
	Jann Horn <jannh@...gle.com>
Subject: Re: [RFC] Unconditionally lock folios when calling rmap_walk()

 (

On Thu, Aug 21, 2025 at 9:14 AM Zi Yan <ziy@...dia.com> wrote:
>
> On 21 Aug 2025, at 8:01, Barry Song wrote:
>
> > On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@...gle.com> wrote:
> >>
> >> Adding linux-mm mailing list. Mistakenly used the wrong email address.
> >>
> >> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@...gle.com> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Currently, some callers of rmap_walk() conditionally avoid try-locking
> >>> non-ksm anon folios. This necessitates serialization through anon_vma
> >>> write-lock when folio->mapping and/or folio->index (fields involved in
> >>> rmap_walk()) are to be updated. This hurts scalability due to coarse
> >>> granularity of the lock. For instance, when multiple threads invoke
> >>> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from
> >>> the same src VMA, they all contend for the corresponding anon_vma’s
> >>> lock. Field traces for arm64 android devices reveal over 30ms of
> >>> uninterruptible sleep in the main UI thread, leading to janky user
> >>> interactions.
> >>>
> >>> Among all rmap_walk() callers that don’t lock anon folios,
> >>> folio_referenced() is the most critical (others are
> >>> page_idle_clear_pte_refs(), damon_folio_young(), and
> >>> damon_folio_mkold()). The relevant code in folio_referenced() is:
> >>>
> >>> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> >>>         we_locked = folio_trylock(folio);
> >>>         if (!we_locked)
> >>>                 return 1;
> >>> }
>
> This seems to be legacy code from commit 5ad6468801d2 ("ksm: let shared pages be
> swappable"). From the commit log, the lock is used to protect KSM stable
> tree from concurrent modification.
>
It seems like the conditional locking of file page/folio was added in
a 2004 commit  edcc56dc6a7c758c ("maplock: kill page_map_lock"). Later
in the commit you mentioned locking was also added for KSM, and now
only non-KSM anon folios are left :-)

> >>>
> >>> It’s unclear why locking anon_vma (when updating folio->mapping) is
> >>> beneficial over locking the folio here. It’s in the reclaim path, so
> >>> should not be a critical path that necessitates some special
> >>> treatment, unless I’m missing something.
>
> The decision was made before the first git commit 1da177e4c3f4 based on
> git history. Maybe it is time to revisit it and improve it.
>
>
> >>>
> >>> Therefore, I propose simplifying the locking mechanism by
> >>> unconditionally try-locking the folio in such cases. This helps avoid
> >>> locking anon_vma when updating folio->mapping, which, for instance,
> >>> will help eliminate the uninterruptible sleep observed in the field
> >>> traces mentioned earlier. Furthermore, it enables us to simplify the
> >>> code in folio_lock_anon_vma_read() by removing the re-check to ensure
> >>> that the field hasn’t changed under us.
> >
> > Thanks, I’m personally quite interested in this topic and will take a
> > closer look as well. Beyond this one userfaultfd move, we’ve observed
> > severe anon_vma lock contention between fork, unmap (process exit), and
> > memory reclamation. This has caused noticeable UI stutters, especially
> > when many VMAs share the same anon_vma root.
> >
> > Thanks
> > Barry
>
>
> --
> Best Regards,
> Yan, Zi