linux-kernel - Re: [PATCH v2 1/4] mm: Introduce vm_uffd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aGagpUkNogTxS7dk@x1.local>
Date: Thu, 3 Jul 2025 11:24:21 -0400
From: Peter Xu <peterx@...hat.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Nikita Kalyazin <kalyazin@...zon.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
	Muchun Song <muchun.song@...ux.dev>,
	Mike Rapoport <rppt@...nel.org>, Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	James Houghton <jthoughton@...gle.com>,
	Michal Hocko <mhocko@...e.com>,
	David Hildenbrand <david@...hat.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Oscar Salvador <osalvador@...e.de>,
	Axel Rasmussen <axelrasmussen@...gle.com>,
	Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API

On Wed, Jul 02, 2025 at 10:00:51PM -0400, Liam R. Howlett wrote:
> * Peter Xu <peterx@...hat.com> [250702 17:36]:
> > On Wed, Jul 02, 2025 at 05:24:02PM -0400, Liam R. Howlett wrote:
> > > That's because the entry point is from a function pointer, so [3] won't
> > > help at all.
> > > 
> > > It is recreating the situation that existed for the vma through the
> > > vm_ops in mmap, but for uffd.  And at a lower level (page tables).  I do not
> > > want to relive that experience.
> > > 
> > > We are not doing this.  It is for the benefit of everyone that we are
> > > not doing this.
> > 
> > Is the vma issue about "allowing vma->vm_flags to be modified anywhere"
> > issue?  Or is there a pointer to the issue being discussed if not?
> 
> The issue is passing pointers of structs that are protected by locks or
> ref counters into modules to do what they please.
> 
> vma->vm_flags was an example of where we learned how wrong this can go.
> 
> There is also the concern of the state of the folio on return from the
> callback.  The error handling gets messy quick.
> 
> Now, imagine we have something that gets a folio, but then we find a
> solution for contention of a lock or ref count (whatever is next), but
> it doesn't work because the mm code has been bleeding into random
> modules and we have no clue what that module is supposed to be doing, or
> we can't make the necessary change because this module will break
> userspace, or cause a performance decrease, or any other random thing
> that we cannot work around without rewriting (probably suboptimally)
> something we don't maintain.
> 
> Again, these are examples of how this can go bad but not an exhaustive
> list by any means.
> 
> So the issue is with allowing modules to play with the folio and page
> tables on their own.

I understand the concern, however IMHO that's really why mm can be hard and
important at the same time..

We definitely have driver code manipulating pgtables.  We also have folios
or pages that can be directly accessible from drivers.

After all mm is the core function provider for those and there needs to be
some API accessing them from outside.

I agree some protection would be nice, like what Suren did with the
vm_flags using __private, even though it's unfortunate it only works with
sparse not a warn/error when compiling, as vm_flags is not a pointer.
OTOH, forbid exposing anything might be an overkill, IMHO.  It stops mm
from growing in healthy ways.

> 
> If this is outside the mm, we probably won't even be Cc'ed on modules
> that use it.
> 
> And do we want to be Cc'ed on modules that want to use it?

For this specific case, I'm happy to be copied if guest-memfd will start to
support userfaultfd, because obviously I also work with the kvm community.
It'll be the same if not, as I'm list as an userfaultfd reviewer.

But when it's in the modules, it should really be the modules job.  It's ok
too when it's an API then mm people do not get copied.  It looks fine to me.

> 
> We will most likely be Cc'ed or emailed directly on the resulting memory
> leak/security issue that results in what should be mm code.  It'll be a
> Saturday because it always is.. :)

True, it's just unavoidable IMHO, and after triaged then the module owner
needs to figure out how to fix it, not a mm developer, if the bug only
happens with the module.

It's the same when a module allocated a folio/page and randomly update its
flags.  It may also crash core mm later.  We can have more protections all
over the places but I don't see an easy way to completely separate core mm
from modules.

> 
> Even the example use code had a potential ref leak that you found [1].

That's totally ok.  I appreciate Nikita's help completely and never thought
it as an issue.  IMHO the leak is not a big deal in verifying the API.

> 
> > > 
> > > We need to find another way.
> > 
> > Could you suggest something?  The minimum goal is to allow guest-memfd
> > support minor faults.
> 
> Mike brought up another idea, that seems worth looking into.

I replied to Mike already before we extended this thread.  Feel free to
chime in with any suggestions on top.  So far this series is still almost
the best I can think of.

Thanks,

-- 
Peter Xu