lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMx0oGwRpSTcfdnf@x1.local>
Date: Thu, 18 Sep 2025 17:07:44 -0400
From: Peter Xu <peterx@...hat.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
	David Hildenbrand <david@...hat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Nikita Kalyazin <kalyazin@...zon.com>,
	Mike Rapoport <rppt@...nel.org>,
	Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
	Muchun Song <muchun.song@...ux.dev>,
	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	James Houghton <jthoughton@...gle.com>,
	Michal Hocko <mhocko@...e.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Oscar Salvador <osalvador@...e.de>,
	Axel Rasmussen <axelrasmussen@...gle.com>,
	Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API

On Thu, Sep 18, 2025 at 03:43:34PM -0400, Liam R. Howlett wrote:
> * Peter Xu <peterx@...hat.com> [250918 14:21]:
> > On Thu, Sep 18, 2025 at 07:53:46PM +0200, David Hildenbrand wrote:
> > > Re Nikita: If we could just reuse fault() for userfaultfd purposes, that
> > > might actually be pretty nice.
> > 
> > I commented on that.
> > 
> > https://lore.kernel.org/all/aEiwHjl4tsUt98sh@x1.local/

[1]

> > 
> > That'll need to leak FAULT_FLAG_USERFAULT_CONTINUE which isn't necessary,
> > make it extremely hard to know when to set the flag, and comlicates the
> > fault path which isn't necessary.
> > 
> > I think Mike's comment was spot on, that the new API is literally
> > do_fault() for shmem, but only used in userfaultfd context so it's even an
> > oneliner.
> > 
> > I do not maintain mm, so above is only my two cents, so I don't make
> > decisions.  Personally I still prefer the current approach of keep the mm
> > main fault path clean.
> 
> What we are trying to say is you can have a fault path that takes a type
> enum that calls into a function that does whatever you want.  It can
> even live in mm/userfaultfd.c.  It can then jump off to mm/guest-memfd.c
> which can contain super unique copying of memory if that's needed.

Per mentioning of mm/guest-memfd.c, are you suggesting to take guest-memfd
library approach?

We have in total of at least three proposals:

(a) https://lore.kernel.org/all/20250404154352.23078-1-kalyazin@amazon.com/
(b) this one
(c) https://lore.kernel.org/all/20250915161815.40729-1-kalyazin@amazon.com/

I reviewd (a) and (c) and I provided my comments.  If you prefer the
library approach, feel free to reply directly to (c) thread against my
email.

I chose (b), from when it was posted.

> 
> That way driver/i_know_better_that_everyone.c or fs/stature.c don't
> decide they can register their uffd and do cool stuff that totally won't
> tank the system in random strange ways.

What is the difference if they are allowed to register ->fault() and tank
the system?

> 
> Seriously, how many fault handlers are you expecting to have here?

First of all, it's not about "how many".  We can assume one user as of now.
Talking about any future user doesn't really help.  The choice I made above
on (b) is the best solution I think, with any known possible users.  The
plan might change, when more use cases pops up.  However we can only try to
make a fair decision with the current status quo.

OTOH, the vm_uffd_ops also provides other fields (besides uffd_*() hooks).
I wouldn't be surprised if a driver wants to opt-in with some of the fields
with zero hooks attached at all, when an userfaultfd feature is
automatically friendly to all kinds of memory types.

Consider one VMA that sets UFFDIO_WRITEPROTECT but without most of the
rest.

As I discussed, IMHO (b) is the clean way to describe userfaultfd demand
for any memory type.

> 
> I'd be surprised if a lot of the code in these memory types isn't
> shared, but I guess if they are all over the kernel they'll just clone
> the code and bugs (like the arch code does, but with less of a reason).
> 
> > Besides, this series also cleans up other places all over the places, the
> > vm_uffd_ops is a most simplified version of description for a memory type.
> 
> 6 files changed, 207 insertions(+), 76 deletions(-)
> 
> Can you please point me to which patch has clean up?

Patch 4.  If you want me to explain every change I touched that is a
cleanup, I can go into details.  Maybe it's faster if you read them, it's
not a huge patch.

> 
> The mm has uffd code _everywhere_, including mm/userfaultfd.c that jumps
> to fs/userfaultfd.c and back.  Every entry has a LIST_HEAD(uf) [1] [2]
> [3] in it that is passed through the entire call stack in case it is
> needed [4] [5] [6] [7] [8].   It has if statements in core mm functions
> in the middle of loops [9].  It complicates error handling and has
> tricky locking [10] (full disclosure, I helped with the locking..
> totally worth the complication).
> 
> This is a seriously complicated feature.

I know you're the expert of VMA, you worked a lot of it.  I understand
those things can caused you frustrations when touching those codes.  Though
please let's do not bring those into reviewing this series.  Those have
nothing to do with this series.

Frankly, I also wished if one day we can get rid of some of them.  It
really depends on how many users are there for the uffd events besides the
generic ones (fault traps).

> 
> How is a generic callback that splits out into, probably 4?, functions
> the deal breaker here?
> 
> How is leaking a flag the line that we won't cross?
> 
> > So IMHO it's beneficial in other aspects as well.  If uffd_copy() is a
> > concern, fine, we drop it.  We don't plan to have more use of UFFDIO_COPY
> > outside of the known three memory types after all.
> 
> EXACTLY!  There are three memory types and we're going to the most
> flexible interface possible, with the most danger.  With guest_memfd
> we're up to four functions we'd need.  Why not keep the mm code in the
> mm and have four functions to choose from?  If you want 5 we can always
> add another.

I assume for most of the rest comments, you're suggesting the library
approach.  If so, again, I suggest we discuss explicitly in that thread.

The library code may (or may not) be useful for other purposes.  For the
support of userfaultfd, it definitely doesn't need to be a dependency.
OTOH, we may still want some abstraction like this one with/without the
library.  If so, I don't really see why we need to pushback on this.

AFAIU, the only concern (after drop uffd_copy) is we'll expose
uffd_get_folio().  Please help explain why ->fault() isn't worse.

If we accepted ->fault() for all these years, I don't see a reason we
should reject ->uffd_get_folio(), especially one of the goals is to keep
the core mm path clean, per my comment to proposal (a).

-- 
Peter Xu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ