lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <oybstepupgh2s5lpwwk7scryem4mgk2t7wmkr5zhj7cz4da3bp@aylj2x6sgtap>
Date: Fri, 19 Sep 2025 13:22:01 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Peter Xu <peterx@...hat.com>
Cc: David Hildenbrand <david@...hat.com>, Mike Rapoport <rppt@...nel.org>,
        Nikita Kalyazin <kalyazin@...zon.com>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
        Muchun Song <muchun.song@...ux.dev>, Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        James Houghton <jthoughton@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API

* Peter Xu <peterx@...hat.com> [250916 15:56]:

...

First, thanks for replying.  Sorry I missed it.

> > 
> > Can you explore moving hugetlb_mfill_atomic_pte and
> > shmem_mfill_atomic_pte into mm/userfaultfd.c and generalizing them to
> > use the same code?
> > 
> > That is, split the similar blocks into functions and reduce duplication.
> > 
> > These are under the UFFD config option and are pretty similar.  This
> > will also limit the bleeding of mfill_atomic_mode out of uffd.
> 
> These are different problems to solve.
> 
> I did propose a refactoring of hugetlbfs in general years ago.  I didn't
> finish that.  It's not only because it's a huge amount of work that almost
> nobody likes to review (even if everybody likes to see landing), also that
> since we decided to not add more complexity into hugetlbfs code like HGM
> there's also less point on refactoring.
> 
> I hope we can be on the same page to understand the problem we want to
> solve here.  The problem is we want to support userfaultfd on guest-memfd.
> Your suggestion could be helpful, but IMHO it's orthogonal to the current
> problem.  It's also not a dependency.  If it is, you should see code
> duplications introduced in this series, which might not be needed after a
> cleanup.  It's not like that.
> 
> This series simply resolves a totally different problem.  The order on
> solving this problem or cleaning things elsewhere (or we decide to not do
> it at all) are not deterministic, IMHO.

I find it very difficult to see what abstraction of functions are needed
with the code in the current state. I feel like we are exposing an
interface that we cannot see what we need.

> 
> > 
> > 
> > 
> > If you look at what the code does in userfaultfd.c, you can see that
> > some refactoring is necessary for other reasons:
> > 
> > mfill_atomic() calls mfill_atomic_hugetlb(), or it enters a while
> > (src_addr < src_start + len) to call mfill_atomic_pte().. which might
> > call shmem_mfill_atomic_pte() in mm/shmem.c
> > 
> > mfill_atomic_hugetlb() calls, in a while (src_addr < src_start + len)
> > loop and calls hugetlb_mfill_atomic_pte() in mm/hugetlb.c
> 
> Again, IMHO this still falls into hugetlbfs refactoring category.  I would
> sincerely request that we put that as a separate topic to discuss, because
> it's huge and community resources are limited on both developments and
> reviews.
> 
> Shmem is already almost sharing code with anonymous, after all these two
> memory types are the only ones that support userfaultfd besides hugetlbfs.
> 
> > 
> > The shmem call already depends on the vma flags.. which it still does in
> > your patch 4 here.  So you've replaced this:
> > 
> > if (!(dst_vma->vm_flags & VM_SHARED)) {
> > ...
> > } else {
> >         shmem_mfill_atomic_pte()
> > }
> > 
> > With...
> > 
> > if (!(dst_vma->vm_flags & VM_SHARED)) {
> > ...
> > } else {
> > ...
> >         uffd_ops->uffd_copy()
> > }
> 
> I'm not 100% sure I get this comment.  It's intentional to depend on vma
> flags here for shmem.  See Andrea's commit:
> 
> commit 5b51072e97d587186c2f5390c8c9c1fb7e179505
> Author: Andrea Arcangeli <aarcange@...hat.com>
> Date:   Fri Nov 30 14:09:28 2018 -0800
> 
>     userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
> 
> Are you suggesting we should merge it somehow?  I'd appreciate some
> concrete example here if so.

What I am saying is that the containing function, mfill_atomic_pte(),
should be absorbed into each memory type's ->uffd_copy(), at least
partially.

Shouldn't each memory type do all the necessary checks in ->uffd_copy()?

To put it another way, no shared memory vma will enter the if()
statement, so why are we checking if they need to?

So if the default uffd_copy() does the if (!shared) stuff, you can just
call uffd_ops->uffd_copy() with out any check there, right?

> 
> > 
> > So, really, what needs to happen first is userfaultfd needs to be
> > sorted.
> > 
> > There's no point of creating a vm_ops_uffd if it will just serve as
> > replacing the call locations of the functions like this, as it has done
> > nothing to simplify the logic.
> 
> I had a vague feeling that you may not have understood the goal of this
> series.  I thought we were on the same page there.  It could be partly my
> fault if that's not obvious, I can try to be more explicit in the cover
> letter next if I'm going to repost.

I certainly feel like I've lost the overall goal during the
conversations sometimes, thanks for pointing that out.

I keep thinking this is about memory type modularization, I think the
cover letter subject keeps getting in my way.

> 
> Please consider supporting guest-memfd as the goal, vm_ops_uffd is to allow
> all changes to happen in guest-memfd.c.  Without it, it can't happen.
> That's the whole point so far.
> 

...

> 
> While I was absent, Nikita sent a version that is based on the library
> code.  That uses direct function calls:
> 
> https://lore.kernel.org/all/20250915161815.40729-3-kalyazin@amazon.com/
> 
> I think it's great we have the other sample of implementing this.
> 
> I believe Nikita has no strong opinion how this will land at last, whatever
> way the community prefers.  I prefer this solution I sent.
> 
> Do you have a preference, when explicitly there's the other one to compare?

I like this better.

> 
> > 
> > > Said that, I still prefer this against
> > > hard-code and/or use CONFIG_GUESTMEM in userfaultfd code.
> > 
> > It also does nothing with regards to the CONFIG_USERFAULTFD in other
> > areas.  My hope is that you could pull out the common code and make the
> > CONFIG_ sections smaller.
> > 
> > And, by the way, it isn't the fact that we're going to copy something
> > (or mfill_atomic it, I guess?) but the fact that we're handing out the
> > pointer for something else to do that.  Please handle this manipulation
> > in the core mm code without handing out pointers to folios, or page
> > tables.
> 
> Is "return pointers to folios" also an issue now in the API?  Are you
> NACKing uffd_get_folio() API that this series proposed?

No, I don't think I see the problem with what I'm proposing.

I'd rather find a middle ground.  I was thinking it best to have the
part needing folio pointer to be in the mm while the rest remains in the
guestmem implementation.

The other thread seems to imply that it would all have to be pulled in,
and that seems undesirable as well.

It is just that we have had this sort of interface before and it has
gone poorly for us.  I know there's other ways to do what you are doing
here in regards to ->fault(), but that's the most flexible and hardest
to validate way of doing it.  If we can avoid exposing it in this way,
I'd rather do that.

> 
> > 
> > You could do this with the address being passed in and figure out the
> > type, or even a function pointer that you specifically pass in an enum
> > of the type (I think this is what Lorenzo was suggesting somewhere),
> > maybe with multiple flags for actions and fallback (MFILL|ZERO for
> > example).
> 
> I didn't quickly get it.  I appreciate if there's some more elaborations.

What I was thinking was the same sort of thing that happens on the
ioctl today, but with a memory type, and only for the callback that
needs the folio.

But again, I'm sure I'm missing something and I'm sorry to drag this
out..

And sorry if I upset you, I feel like we're talking past each other and
causing a lot of stress.

Thanks,
Liam



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ