linux-kernel - Re: [RFC PATCH 1/2] mm/filemap: Retry fault by VMA lock if the lock was released for I/O

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <kzwpwntmta4bpt27stif2hs5cwlu54kayfi34o562jpr2mjbew@h3ddkbxibxhf>
Date: Thu, 27 Nov 2025 16:26:30 +0000
From: Pedro Falcato <pfalcato@...e.de>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, 
	Oven Liyang <liyangouwen1@...o.com>, Russell King <linux@...linux.org.uk>, 
	Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, 
	Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>, 
	Madhavan Srinivasan <maddy@...ux.ibm.com>, Michael Ellerman <mpe@...erman.id.au>, 
	Nicholas Piggin <npiggin@...il.com>, Christophe Leroy <christophe.leroy@...roup.eu>, 
	Paul Walmsley <pjw@...nel.org>, Palmer Dabbelt <palmer@...belt.com>, 
	Albert Ou <aou@...s.berkeley.edu>, Alexandre Ghiti <alex@...ti.fr>, 
	Alexander Gordeev <agordeev@...ux.ibm.com>, Gerald Schaefer <gerald.schaefer@...ux.ibm.com>, 
	Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>, 
	Christian Borntraeger <borntraeger@...ux.ibm.com>, Sven Schnelle <svens@...ux.ibm.com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, Andy Lutomirski <luto@...nel.org>, 
	Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, x86@...nel.org, 
	"H . Peter Anvin" <hpa@...or.com>, David Hildenbrand <david@...nel.org>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>, 
	Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, 
	Matthew Wilcox <willy@...radead.org>, Jarkko Sakkinen <jarkko@...nel.org>, 
	Oscar Salvador <osalvador@...e.de>, Kuninori Morimoto <kuninori.morimoto.gx@...esas.com>, 
	Mark Rutland <mark.rutland@....com>, Ada Couprie Diaz <ada.coupriediaz@....com>, 
	Robin Murphy <robin.murphy@....com>, Kristina Martšenko <kristina.martsenko@....com>, 
	Kevin Brodsky <kevin.brodsky@....com>, Yeoreum Yun <yeoreum.yun@....com>, 
	Wentao Guan <guanwentao@...ontech.com>, Thorsten Blum <thorsten.blum@...ux.dev>, 
	Steven Rostedt <rostedt@...dmis.org>, Yunhui Cui <cuiyunhui@...edance.com>, 
	Nam Cao <namcao@...utronix.de>, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, loongarch@...ts.linux.dev, linuxppc-dev@...ts.ozlabs.org, 
	linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	Chris Li <chrisl@...nel.org>, Kairui Song <kasong@...cent.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Nhat Pham <nphamcs@...il.com>, Baoquan He <bhe@...hat.com>, 
	Barry Song <v-songbaohua@...o.com>
Subject: Re: [RFC PATCH 1/2] mm/filemap: Retry fault by VMA lock if the lock
 was released for I/O

On Thu, Nov 27, 2025 at 07:39:11PM +0800, Barry Song wrote:
> On Thu, Nov 27, 2025 at 6:52 PM Pedro Falcato <pfalcato@...e.de> wrote:
> >
> > On Thu, Nov 27, 2025 at 09:14:37AM +0800, Barry Song wrote:
> > > From: Oven Liyang <liyangouwen1@...o.com>
> > >
> > > If the current page fault is using the per-VMA lock, and we only released
> > > the lock to wait for I/O completion (e.g., using folio_lock()), then when
> > > the fault is retried after the I/O completes, it should still qualify for
> > > the per-VMA-lock path.
> > >
> > <snip>
> > > Signed-off-by: Oven Liyang <liyangouwen1@...o.com>
> > > Signed-off-by: Barry Song <v-songbaohua@...o.com>
> > > ---
> > >  arch/arm/mm/fault.c       | 5 +++++
> > >  arch/arm64/mm/fault.c     | 5 +++++
> > >  arch/loongarch/mm/fault.c | 4 ++++
> > >  arch/powerpc/mm/fault.c   | 5 ++++-
> > >  arch/riscv/mm/fault.c     | 4 ++++
> > >  arch/s390/mm/fault.c      | 4 ++++
> > >  arch/x86/mm/fault.c       | 4 ++++
> >
> > If only we could unify all these paths :(
> 
> Right, it’s a pain, but we do have bots for that?
> And it’s basically just copy-and-paste across different architectures.
> 
> >
> > >  include/linux/mm_types.h  | 9 +++++----
> > >  mm/filemap.c              | 5 ++++-
> > >  9 files changed, 39 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > index b71625378ce3..12b2d65ef1b9 100644
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -1670,10 +1670,11 @@ enum vm_fault_reason {
> > >       VM_FAULT_NOPAGE         = (__force vm_fault_t)0x000100,
> > >       VM_FAULT_LOCKED         = (__force vm_fault_t)0x000200,
> > >       VM_FAULT_RETRY          = (__force vm_fault_t)0x000400,
> > > -     VM_FAULT_FALLBACK       = (__force vm_fault_t)0x000800,
> > > -     VM_FAULT_DONE_COW       = (__force vm_fault_t)0x001000,
> > > -     VM_FAULT_NEEDDSYNC      = (__force vm_fault_t)0x002000,
> > > -     VM_FAULT_COMPLETED      = (__force vm_fault_t)0x004000,
> > > +     VM_FAULT_RETRY_VMA      = (__force vm_fault_t)0x000800,
> >
> > So, what I am wondering here is why we need one more fault flag versus
> > just blindly doing this on a plain-old RETRY. Is there any particular
> > reason why? I can't think of one.
> 
> Because in some cases we retry simply due to needing to take mmap_lock.
> For example:
> 
> /**
>  * __vmf_anon_prepare - Prepare to handle an anonymous fault.
>  * @vmf: The vm_fault descriptor passed from the fault handler.
>  *
>  * When preparing to insert an anonymous page into a VMA from a
>  * fault handler, call this function rather than anon_vma_prepare().
>  * If this vma does not already have an associated anon_vma and we are
>  * only protected by the per-VMA lock, the caller must retry with the
>  * mmap_lock held.  __anon_vma_prepare() will look at adjacent VMAs to
>  * determine if this VMA can share its anon_vma, and that's not safe to
>  * do with only the per-VMA lock held for this VMA.
>  *
>  * Return: 0 if fault handling can proceed.  Any other value should be
>  * returned to the caller.
>  */
> vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> {
> ...
> }
> 
> Thus, we have to check each branch one by one, but I/O wait is the most
> frequent path, so we handle it first.
>

Hmm, right, good point. I think this is the safest option then.

FWIW:
Acked-by: Pedro Falcato <pfalcato@...e.de>

-- 
Pedro