[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4w8550U+1dah2VoZNuvLT7D15ktC6704AEmz6eui60YwA@mail.gmail.com>
Date: Thu, 27 Nov 2025 12:42:24 +0800
From: Barry Song <21cnbao@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
Barry Song <v-songbaohua@...o.com>, Russell King <linux@...linux.org.uk>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
Madhavan Srinivasan <maddy@...ux.ibm.com>, Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>, Christophe Leroy <christophe.leroy@...roup.eu>,
Paul Walmsley <pjw@...nel.org>, Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Alexandre Ghiti <alex@...ti.fr>, Alexander Gordeev <agordeev@...ux.ibm.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>, David Hildenbrand <david@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>, Pedro Falcato <pfalcato@...e.de>, Jarkko Sakkinen <jarkko@...nel.org>,
Oscar Salvador <osalvador@...e.de>, Kuninori Morimoto <kuninori.morimoto.gx@...esas.com>,
Oven Liyang <liyangouwen1@...o.com>, Mark Rutland <mark.rutland@....com>,
Ada Couprie Diaz <ada.coupriediaz@....com>, Robin Murphy <robin.murphy@....com>,
Kristina Martšenko <kristina.martsenko@....com>,
Kevin Brodsky <kevin.brodsky@....com>, Yeoreum Yun <yeoreum.yun@....com>,
Wentao Guan <guanwentao@...ontech.com>, Thorsten Blum <thorsten.blum@...ux.dev>,
Steven Rostedt <rostedt@...dmis.org>, Yunhui Cui <cuiyunhui@...edance.com>,
Nam Cao <namcao@...utronix.de>, Chris Li <chrisl@...nel.org>,
Kairui Song <kasong@...cent.com>, Kemeng Shi <shikemeng@...weicloud.com>,
Nhat Pham <nphamcs@...il.com>, Baoquan He <bhe@...hat.com>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, loongarch@...ts.linux.dev,
linuxppc-dev@...ts.ozlabs.org, linux-riscv@...ts.infradead.org,
linux-s390@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] mm: continue using per-VMA lock when retrying
page faults after I/O
On Thu, Nov 27, 2025 at 12:22 PM Barry Song <21cnbao@...il.com> wrote:
>
> On Thu, Nov 27, 2025 at 12:09 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Thu, Nov 27, 2025 at 09:14:36AM +0800, Barry Song wrote:
> > > There is no need to always fall back to mmap_lock if the per-VMA
> > > lock was released only to wait for pagecache or swapcache to
> > > become ready.
> >
> > Something I've been wondering about is removing all the "drop the MM
> > locks while we wait for I/O" gunk. It's a nice amount of code removed:
>
> I think the point is that page fault handlers should avoid holding the VMA
> lock or mmap_lock for too long while waiting for I/O. Otherwise, those
> writers and readers will be stuck for a while.
>
> >
> > include/linux/pagemap.h | 8 +---
> > mm/filemap.c | 98 ++++++++++++-------------------------------------
> > mm/internal.h | 21 -----------
> > mm/memory.c | 13 +------
> > mm/shmem.c | 6 ---
> > 5 files changed, 27 insertions(+), 119 deletions(-)
> >
> > and I'm not sure we still need to do it with per-VMA locks. What I
> > have here doesn't boot and I ran out of time to debug it.
>
> I agree there’s room for improvement, but merely removing the "drop the MM
> locks while waiting for I/O" code is unlikely to improve performance.
>
One idea I have is that we could conditionally remove the "drop lock and
retry page fault" step if we are reasonably sure the I/O has already
completed:
diff --git a/mm/filemap.c b/mm/filemap.c
index 57dfd2211109..151f6d38c284 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3517,7 +3517,9 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
}
}
- if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
+ if (folio_test_uptodate(folio))
+ folio_lock(folio);
+ else if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
goto out_retry;
/* Did it get truncated? */
diff --git a/mm/memory.c b/mm/memory.c
index 7f70f0324dcf..355ed02560fd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4758,7 +4758,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
}
swapcache = folio;
- ret |= folio_lock_or_retry(folio, vmf);
+ if (folio_test_uptodate(folio))
+ folio_lock(folio);
+ else
+ ret |= folio_lock_or_retry(folio, vmf);
if (ret & VM_FAULT_RETRY) {
if (fault_flag_allow_retry_first(vmf->flags) &&
!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT) &&
In that case, we are likely just waiting for the mapping to be completed by
another process. I may develop the above idea as an incremental patch after
this patchset.
Thanks
Barry
Powered by blists - more mailing lists