linux-kernel - Re: [RFC PATCH 0/2] mm: continue using per-VMA lock when retrying page faults after I/O

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4w8550U+1dah2VoZNuvLT7D15ktC6704AEmz6eui60YwA@mail.gmail.com>
Date: Thu, 27 Nov 2025 12:42:24 +0800
From: Barry Song <21cnbao@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, 
	Barry Song <v-songbaohua@...o.com>, Russell King <linux@...linux.org.uk>, 
	Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, 
	Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>, 
	Madhavan Srinivasan <maddy@...ux.ibm.com>, Michael Ellerman <mpe@...erman.id.au>, 
	Nicholas Piggin <npiggin@...il.com>, Christophe Leroy <christophe.leroy@...roup.eu>, 
	Paul Walmsley <pjw@...nel.org>, Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>, 
	Alexandre Ghiti <alex@...ti.fr>, Alexander Gordeev <agordeev@...ux.ibm.com>, 
	Gerald Schaefer <gerald.schaefer@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>, 
	Vasily Gorbik <gor@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Sven Schnelle <svens@...ux.ibm.com>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, x86@...nel.org, 
	"H . Peter Anvin" <hpa@...or.com>, David Hildenbrand <david@...nel.org>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, 
	Michal Hocko <mhocko@...e.com>, Pedro Falcato <pfalcato@...e.de>, Jarkko Sakkinen <jarkko@...nel.org>, 
	Oscar Salvador <osalvador@...e.de>, Kuninori Morimoto <kuninori.morimoto.gx@...esas.com>, 
	Oven Liyang <liyangouwen1@...o.com>, Mark Rutland <mark.rutland@....com>, 
	Ada Couprie Diaz <ada.coupriediaz@....com>, Robin Murphy <robin.murphy@....com>, 
	Kristina Martšenko <kristina.martsenko@....com>, 
	Kevin Brodsky <kevin.brodsky@....com>, Yeoreum Yun <yeoreum.yun@....com>, 
	Wentao Guan <guanwentao@...ontech.com>, Thorsten Blum <thorsten.blum@...ux.dev>, 
	Steven Rostedt <rostedt@...dmis.org>, Yunhui Cui <cuiyunhui@...edance.com>, 
	Nam Cao <namcao@...utronix.de>, Chris Li <chrisl@...nel.org>, 
	Kairui Song <kasong@...cent.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Nhat Pham <nphamcs@...il.com>, Baoquan He <bhe@...hat.com>, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, loongarch@...ts.linux.dev, 
	linuxppc-dev@...ts.ozlabs.org, linux-riscv@...ts.infradead.org, 
	linux-s390@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] mm: continue using per-VMA lock when retrying
 page faults after I/O

On Thu, Nov 27, 2025 at 12:22 PM Barry Song <21cnbao@...il.com> wrote:
>
> On Thu, Nov 27, 2025 at 12:09 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Thu, Nov 27, 2025 at 09:14:36AM +0800, Barry Song wrote:
> > > There is no need to always fall back to mmap_lock if the per-VMA
> > > lock was released only to wait for pagecache or swapcache to
> > > become ready.
> >
> > Something I've been wondering about is removing all the "drop the MM
> > locks while we wait for I/O" gunk.  It's a nice amount of code removed:
>
> I think the point is that page fault handlers should avoid holding the VMA
> lock or mmap_lock for too long while waiting for I/O. Otherwise, those
> writers and readers will be stuck for a while.
>
> >
> >  include/linux/pagemap.h |  8 +---
> >  mm/filemap.c            | 98 ++++++++++++-------------------------------------
> >  mm/internal.h           | 21 -----------
> >  mm/memory.c             | 13 +------
> >  mm/shmem.c              |  6 ---
> >  5 files changed, 27 insertions(+), 119 deletions(-)
> >
> > and I'm not sure we still need to do it with per-VMA locks.  What I
> > have here doesn't boot and I ran out of time to debug it.
>
> I agree there’s room for improvement, but merely removing the "drop the MM
> locks while waiting for I/O" code is unlikely to improve performance.
>

One idea I have is that we could conditionally remove the "drop lock and
retry page fault" step if we are reasonably sure the I/O has already
completed:

diff --git a/mm/filemap.c b/mm/filemap.c
index 57dfd2211109..151f6d38c284 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3517,7 +3517,9 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
                }
        }

-       if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
+       if (folio_test_uptodate(folio))
+               folio_lock(folio);
+       else if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
                goto out_retry;

        /* Did it get truncated? */
diff --git a/mm/memory.c b/mm/memory.c
index 7f70f0324dcf..355ed02560fd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4758,7 +4758,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
        }

        swapcache = folio;
-       ret |= folio_lock_or_retry(folio, vmf);
+       if (folio_test_uptodate(folio))
+               folio_lock(folio);
+       else
+               ret |= folio_lock_or_retry(folio, vmf);
        if (ret & VM_FAULT_RETRY) {
                if (fault_flag_allow_retry_first(vmf->flags) &&
                    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT) &&

In that case, we are likely just waiting for the mapping to be completed by
another process. I may develop the above idea as an incremental patch after
this patchset.

Thanks
Barry