[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b463af0-3f29-4816-bd5d-caa282b1a9cd@lucifer.local>
Date: Mon, 8 Sep 2025 15:48:36 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jan Kara <jack@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Jonathan Corbet <corbet@....net>, Matthew Wilcox <willy@...radead.org>,
Guo Ren <guoren@...nel.org>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
"David S . Miller" <davem@...emloft.net>,
Andreas Larsson <andreas@...sler.com>, Arnd Bergmann <arnd@...db.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Dan Williams <dan.j.williams@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Dave Jiang <dave.jiang@...el.com>, Nicolas Pitre <nico@...xnic.net>,
Muchun Song <muchun.song@...ux.dev>,
Oscar Salvador <osalvador@...e.de>,
David Hildenbrand <david@...hat.com>,
Konstantin Komarov <almaz.alexandrovich@...agon-software.com>,
Baoquan He <bhe@...hat.com>, Vivek Goyal <vgoyal@...hat.com>,
Dave Young <dyoung@...hat.com>, Tony Luck <tony.luck@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Dave Martin <Dave.Martin@....com>, James Morse <james.morse@....com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Hugh Dickins <hughd@...gle.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Uladzislau Rezki <urezki@...il.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>, Jann Horn <jannh@...gle.com>,
Pedro Falcato <pfalcato@...e.de>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-csky@...r.kernel.org, linux-mips@...r.kernel.org,
linux-s390@...r.kernel.org, sparclinux@...r.kernel.org,
nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org, linux-mm@...ck.org,
ntfs3@...ts.linux.dev, kexec@...ts.infradead.org,
kasan-dev@...glegroups.com, Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH 00/16] expand mmap_prepare functionality, port more users
On Mon, Sep 08, 2025 at 03:27:52PM +0200, Jan Kara wrote:
> Hi Lorenzo!
Hey! :)
> > After updating some areas that can simply use mmap_prepare as-is, and
> > performing some housekeeping, we then introduce two new hooks:
> >
> > f_op->mmap_complete - this is invoked at the point of the VMA having been
> > correctly inserted, though with the VMA write lock still held. mmap_prepare
> > must also be specified.
> >
> > This expands the use of mmap_prepare to those callers which need to
> > prepopulate mappings, as well as any which does genuinely require access to
> > the VMA.
> >
> > It's simple - we will let the caller access the VMA, but only once it's
> > established. At this point unwinding issues is simple - we just unmap the
> > VMA.
> >
> > The VMA is also then correctly initialised at this stage so there can be no
> > issues arising from a not-fully initialised VMA at this point.
> >
> > The other newly added hook is:
> >
> > f_op->mmap_abort - this is only valid in conjunction with mmap_prepare and
> > mmap_complete. This is called should an error arise between mmap_prepare
> > and mmap_complete (not as a result of mmap_prepare but rather some other
> > part of the mapping logic).
> >
> > This is required in case mmap_prepare wishes to establish state or locks
> > which need to be cleaned up on completion. If we did not provide this, then
> > this could not be permitted as this cleanup would otherwise not occur
> > should the mapping fail between the two calls.
>
> So seeing these new hooks makes me wonder: Shouldn't rather implement
> mmap(2) in a way more similar to how other f_op hooks behave like ->read or
> ->write? I.e., a hook called at rather high level - something like from
> vm_mmap_pgoff() or similar similar level - which would just call library
> functions from MM for the stuff it needs to do. Filesystems would just do
> their checks and call the generic mmap function with the vm_ops they want
> to use, more complex users could then fill in the VMA before releasing
> mmap_lock or do cleanup in case of failure... This would seem like a more
> understandable API than several hooks with rules when what gets called.
We can't just do everything at this level, because we need:
a. Information to actually know how to map the VMA before putting it in the
maple tree.
b. Once it's there, anything else we need to do (typically - prepopulate).
The crux of this change is to avoid horrors around the VMA being passed
around not yet being properly initialised, and yet being accessible for
drivers to do 'whatever' with.
Ideally we'd have only one case, and for _nearly all_ filesystems this is
how it is actually.
But sadly some _do need_ to do extra work afterwards, most notably,
prepopulation.
Cheers, Lorenzo
Powered by blists - more mailing lists