lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5231b96b-db1c-48b0-987b-736bc90843f9@lucifer.local>
Date: Thu, 3 Jul 2025 17:15:14 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Peter Xu <peterx@...hat.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Nikita Kalyazin <kalyazin@...zon.com>,
        Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
        Muchun Song <muchun.song@...ux.dev>, Mike Rapoport <rppt@...nel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        James Houghton <jthoughton@...gle.com>, Michal Hocko <mhocko@...e.com>,
        David Hildenbrand <david@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API

On Thu, Jul 03, 2025 at 11:24:21AM -0400, Peter Xu wrote:
> On Wed, Jul 02, 2025 at 10:00:51PM -0400, Liam R. Howlett wrote:
> > * Peter Xu <peterx@...hat.com> [250702 17:36]:
> > > On Wed, Jul 02, 2025 at 05:24:02PM -0400, Liam R. Howlett wrote:
> > > > That's because the entry point is from a function pointer, so [3] won't
> > > > help at all.
> > > >
> > > > It is recreating the situation that existed for the vma through the
> > > > vm_ops in mmap, but for uffd.  And at a lower level (page tables).  I do not
> > > > want to relive that experience.
> > > >
> > > > We are not doing this.  It is for the benefit of everyone that we are
> > > > not doing this.
> > >
> > > Is the vma issue about "allowing vma->vm_flags to be modified anywhere"
> > > issue?  Or is there a pointer to the issue being discussed if not?
> >
> > The issue is passing pointers of structs that are protected by locks or
> > ref counters into modules to do what they please.
> >
> > vma->vm_flags was an example of where we learned how wrong this can go.
> >
> > There is also the concern of the state of the folio on return from the
> > callback.  The error handling gets messy quick.
> >
> > Now, imagine we have something that gets a folio, but then we find a
> > solution for contention of a lock or ref count (whatever is next), but
> > it doesn't work because the mm code has been bleeding into random
> > modules and we have no clue what that module is supposed to be doing, or
> > we can't make the necessary change because this module will break
> > userspace, or cause a performance decrease, or any other random thing
> > that we cannot work around without rewriting (probably suboptimally)
> > something we don't maintain.
> >
> > Again, these are examples of how this can go bad but not an exhaustive
> > list by any means.
> >
> > So the issue is with allowing modules to play with the folio and page
> > tables on their own.
>
> I understand the concern, however IMHO that's really why mm can be hard and
> important at the same time..

I feel like you're dismissing the concerns altogether honestly.

>
> We definitely have driver code manipulating pgtables.  We also have folios
> or pages that can be directly accessible from drivers.

'There's already bad stuff going on' is not an argument for doing more.

>
> After all mm is the core function provider for those and there needs to be
> some API accessing them from outside.

The point being made here is what are internals and what are not.

We don't expose internals, we do expose a carefully controlled interface that
minimises risk of things being broken.

>
> I agree some protection would be nice, like what Suren did with the
> vm_flags using __private, even though it's unfortunate it only works with
> sparse not a warn/error when compiling, as vm_flags is not a pointer.
> OTOH, forbid exposing anything might be an overkill, IMHO.  It stops mm
> from growing in healthy ways.

I'm not sure what is healthy about no longer being able to make assumptions
about what code does because hooks are being invoked with drivers doing
_whatever they like_.

Part of the purpose of review is avoiding making decisions that cause
problems down the line.

>
> >
> > If this is outside the mm, we probably won't even be Cc'ed on modules
> > that use it.
> >
> > And do we want to be Cc'ed on modules that want to use it?
>
> For this specific case, I'm happy to be copied if guest-memfd will start to
> support userfaultfd, because obviously I also work with the kvm community.
> It'll be the same if not, as I'm list as an userfaultfd reviewer.
>
> But when it's in the modules, it should really be the modules job.  It's ok
> too when it's an API then mm people do not get copied.  It looks fine to me.

This is ridiculous, we expose mm internals to modules, and then no longer
have to care when they break things, or a subtle thing changes in mm?

On the one hand you argue that in-tree drivers can be 'monitored' +
therefore it's fine, but on the other hand you say it doesn't matter what
they do and it's nothing to do with us so we shouldn't care about being
infomred of changes?

These two positions are contradcitory and neither are good.

>
> >
> > We will most likely be Cc'ed or emailed directly on the resulting memory
> > leak/security issue that results in what should be mm code.  It'll be a
> > Saturday because it always is.. :)
>
> True, it's just unavoidable IMHO, and after triaged then the module owner
> needs to figure out how to fix it, not a mm developer, if the bug only
> happens with the module.

Except it impacts mm as a whole.

And it is avoidable, we can just _not do this_.

>
> It's the same when a module allocated a folio/page and randomly update its
> flags.  It may also crash core mm later.  We can have more protections all
> over the places but I don't see an easy way to completely separate core mm
> from modules.

Yes if modules absolutely abuse things then it's problematic, but the issue
is that mm has a whole host of extremely subtle considerations.

I documented a lot of these at
https://kernel.org/doc/html/latest/mm/process_addrs.html

These details change quite often, including some extremely sensitive and
delicate details around locking.

Do yo really think module authors are going to correctly implement all of
this?

It's very obvious these are internal implementation details.

>
> >
> > Even the example use code had a potential ref leak that you found [1].
>
> That's totally ok.  I appreciate Nikita's help completely and never thought
> it as an issue.  IMHO the leak is not a big deal in verifying the API.

I think it's quite telling as to your under-worrying :) about this stuff to
think that that's not a problem.

You guys have already made subtle errors in implementing the simplest
possible use of the API.

This just makes the point for us I think?

>
> >
> > > >
> > > > We need to find another way.
> > >
> > > Could you suggest something?  The minimum goal is to allow guest-memfd
> > > support minor faults.
> >
> > Mike brought up another idea, that seems worth looking into.
>
> I replied to Mike already before we extended this thread.  Feel free to
> chime in with any suggestions on top.  So far this series is still almost
> the best I can think of.

I mean if you don't want to consider alternatives, maybe this series simply
isn't viable?

I made suggestions in the other thread re: a very _soft_ interface where we
implement things in core mm but _configure_ them in modules as an
altenrative.

That would avoid exposure of mm internals.

I really don't think a series that exposes anything even vaguely sensitive
is viable imo.

>
> Thanks,
>
> --
> Peter Xu
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ