lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e7vr62s73dftijeveyg6lfgivctijz4qcar3teswjbuv6gog3k@4sbpuj35nbbh>
Date: Fri, 4 Jul 2025 15:39:32 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Peter Xu <peterx@...hat.com>
Cc: David Hildenbrand <david@...hat.com>, Mike Rapoport <rppt@...nel.org>,
        Nikita Kalyazin <kalyazin@...zon.com>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
        Muchun Song <muchun.song@...ux.dev>, Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        James Houghton <jthoughton@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API

* Peter Xu <peterx@...hat.com> [250704 11:00]:
> On Fri, Jul 04, 2025 at 11:34:15AM +0200, David Hildenbrand wrote:
> > On 03.07.25 19:48, Mike Rapoport wrote:
> > > On Wed, Jul 02, 2025 at 03:46:57PM -0400, Peter Xu wrote:
> > > > On Wed, Jul 02, 2025 at 08:39:32PM +0300, Mike Rapoport wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > The main target of this change is the implementation of UFFD for
> > > > > > KVM/guest_memfd (examples: [1], [2]) to avoid bringing KVM-specific code
> > > > > > into the mm codebase.  We usually mean KVM by the "drivers" in this context,
> > > > > > and it is already somewhat "knowledgeable" of the mm.  I don't think there
> > > > > > are existing use cases for other drivers to implement this at the moment.
> > > > > > 
> > > > > > Although I can't see new exports in this series, there is now a way to limit
> > > > > > exports to particular modules [3].  Would it help if we only do it for KVM
> > > > > > initially (if/when actually needed)?
> > > > > 
> > > > > There were talks about pulling out guest_memfd core into mm, but I don't
> > > > > remember patches about it. If parts of guest_memfd were already in mm/ that
> > > > > would make easier to export uffd ops to it.
> > > > 
> > > > Do we have a link to such discussion?  I'm also curious whether that idea
> > > > was acknowledged by KVM maintainers.
> > > 
> > > AFAIR it was discussed at one of David's guest_memfd calls
> > 
> > While it was discussed in the call a couple of times in different context
> > (guest_memfd as a library / guest_memfd shim), I think we already discussed
> > it back at LPC last year.
> > 
> > One of the main reasons for doing that is supporting guest_memfd in other
> > hypervisors -- the gunyah hypervisor in the kernel wants to make use of it
> > as well.
> 
> I see, thanks for the info. I found the series, it's here:
> 
> https://lore.kernel.org/all/20241113-guestmem-library-v3-0-71fdee85676b@quicinc.com/
> 
> Here, the question is whether do we still want to keep explicit calls to
> shmem, hugetlbfs and in the future, guest-memfd.  The library-ize of
> guest-memfd doesn't change a huge lot on answering this question, IIUC.

Can you explore moving hugetlb_mfill_atomic_pte and
shmem_mfill_atomic_pte into mm/userfaultfd.c and generalizing them to
use the same code?

That is, split the similar blocks into functions and reduce duplication.

These are under the UFFD config option and are pretty similar.  This
will also limit the bleeding of mfill_atomic_mode out of uffd.



If you look at what the code does in userfaultfd.c, you can see that
some refactoring is necessary for other reasons:

mfill_atomic() calls mfill_atomic_hugetlb(), or it enters a while
(src_addr < src_start + len) to call mfill_atomic_pte().. which might
call shmem_mfill_atomic_pte() in mm/shmem.c

mfill_atomic_hugetlb() calls, in a while (src_addr < src_start + len)
loop and calls hugetlb_mfill_atomic_pte() in mm/hugetlb.c

The shmem call already depends on the vma flags.. which it still does in
your patch 4 here.  So you've replaced this:

if (!(dst_vma->vm_flags & VM_SHARED)) {
...
} else {
        shmem_mfill_atomic_pte()
}

With...

if (!(dst_vma->vm_flags & VM_SHARED)) {
...
} else {
...
        uffd_ops->uffd_copy()
}

So, really, what needs to happen first is userfaultfd needs to be
sorted.

There's no point of creating a vm_ops_uffd if it will just serve as
replacing the call locations of the functions like this, as it has done
nothing to simplify the logic.

> However if we want to generalize userfaultfd capability for a type of
> memory, we will still need something like the vm_uffd_ops hook to report
> such information.  It means drivers can still overwrite these, with/without
> an exported mfill_atomic_install_pte() functions.  I'm not sure whether
> that eases the concern.

If we work through the duplication and reduction where possible, the
path forward may be easier to see.

> 
> So to me, generalizing the mem type looks helpful with/without moving
> guest-memfd under mm/.

Yes, it should decrease the duplication across hugetlb.c and shmem.c,
but I think that userfaultfd is the place to start.

> 
> We do have the option to keep hard-code guest-memfd like shmem or
> hugetlbfs. This is still "doable", but this likely means guest-memfd
> support for userfaultfd needs to be done after that work.  I did quickly
> check the status of gunyah hypervisor [1,2,3], I found that all of the
> efforts are not yet continued in 2025.  The hypervisor last update was Jan
> 2024 with a batch push [1].
> 
> I still prefer generalizing uffd capabilities using the ops.  That makes
> guest-memfd support on MINOR not be blocked and it should be able to be
> done concurrently v.s. guest-memfd library.  If guest-memfd library idea
> didn't move on, it's non-issue either.
> 
> I've considered dropping uffd_copy() and MISSING support for vm_uffd_ops if
> I'm going to repost - that looks like the only thing that people are
> against with, even though that is not my preference, as that'll make the
> API half-broken on its own.

The generalisation you did does not generalize much, as I pointed out
above, and so it seems less impactful than it could be.

These patches also do not explore what this means for guest_memfd.  So
it is not clear that the expected behaviour will serve the need.

You sent a link to an example user.  Can you please keep this work
together in the patch set so that we know it'll work for your use case
and allows us an easier way to pull down this work so we can examine it.

Alternatively, you can reduce and combine the memory types without
exposing the changes externally, if they stand on their own.  But I
don't think anyone is going to accept using a vm_ops change where a
direct function call could be used.

> Said that, I still prefer this against
> hard-code and/or use CONFIG_GUESTMEM in userfaultfd code.

It also does nothing with regards to the CONFIG_USERFAULTFD in other
areas.  My hope is that you could pull out the common code and make the
CONFIG_ sections smaller.

And, by the way, it isn't the fact that we're going to copy something
(or mfill_atomic it, I guess?) but the fact that we're handing out the
pointer for something else to do that.  Please handle this manipulation
in the core mm code without handing out pointers to folios, or page
tables.

You could do this with the address being passed in and figure out the
type, or even a function pointer that you specifically pass in an enum
of the type (I think this is what Lorenzo was suggesting somewhere),
maybe with multiple flags for actions and fallback (MFILL|ZERO for
example).

Thanks,
Liam


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ