lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <92265a41-7e32-430c-8ab2-4e7680609624@lucifer.local>
Date: Mon, 30 Jun 2025 11:29:30 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Peter Xu <peterx@...hat.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Vlastimil Babka <vbabka@...e.cz>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>, Mike Rapoport <rppt@...nel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        James Houghton <jthoughton@...gle.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Nikita Kalyazin <kalyazin@...zon.com>, Michal Hocko <mhocko@...e.com>,
        David Hildenbrand <david@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 0/4] mm/userfaultfd: modulize memory types

On Fri, Jun 27, 2025 at 11:46:51AM -0400, Peter Xu wrote:
> [based on latest akpm/mm-new of June 27th, commit 9be7387ae43f]
>
> v2 changelog:
> - Patch 1
>   - update English in commit log [David]
>   - move vm_uffd_ops definition to userfaultfd_k.h [Mike]
> - Patch 4
>   - fix sparse warning on bitwise type conversions [syzbot]
>   - Commit message updates on explanation of vma_can_userfault check [James]
>
> v1: https://lore.kernel.org/r/20250620190342.1780170-1-peterx@redhat.com
>
> This series is an alternative proposal of what Nikita proposed here on the
> initial three patches:
>
>   https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com
>
> This is not yet relevant to any guest-memfd support, but paving way for it.
> Here, the major goal is to make kernel modules be able to opt-in with any
> form of userfaultfd supports, like guest-memfd.  This alternative option
> should hopefully be cleaner, and avoid leaking userfault details into
> vm_ops.fault().
>
> It also means this series does not depend on anything.  It's a pure
> refactoring of userfaultfd internals to provide a generic API, so that
> other types of files, especially RAM based, can support userfaultfd without
> touching mm/ at all.

I'm very concerned that this change will simply move core mm functionality out
of mm and into drivers where it can bitrot and cause subtle bugs?

You're proposing providing stuff like page table state and asking for a folio
back from a driver etc.

I absolutely am not in favour of us providing core mm internals like this to
drivers, and I don't want to see us having to EXPORT() mm internals just to make
module-ised uffd code work (I mean I just will flat out refuse to do that).

I think we need to think _very_ carefully about how we do this.

I also feel like this series is at a really basic level and you've not fully
determined what API calls you need.

I agree that it's sensible to be incremental, but I feel like you sort of need
to somewhat prove the case that you can jump from 'incremental version where we
only support code in mm/' to supporting arbitrary file system code that might be
modules.

Because otherwise you're basically _guessing_ that you can do this, possibly, in
the future and maybe it's just not the right approach but that's not clear yet?

>
> To achieve it, this series introduced a file operation called vm_uffd_ops.
> The ops needs to be provided when a file type supports any of userfaultfd.
>
> With that, I moved both hugetlbfs and shmem over.

Well as you say below hugetlbfs is sort of a stub implementation, I wonder
whether we'd need quite a bit more to make that work.

One thing I'd _really_ like to avoid is us having to add a bunch of hook points
into core mm code just for uffd that then call out to some driver.

We've encountered such a total nightmare with .mmap() for instance in the past
(including stuff that resulted in security issues) because we - simply cannot
assume anything - about what the hook implementor might do with the passed
parameters.

This is really really problematic.

I also absolutely hate the:

if (uffd)
	do_something_weird();

Pattern, so hopefully this won't proliferate that.

>
> Hugetlbfs is still very special that it will only use partial of the
> vm_uffd_ops API, due to similar reason why hugetlb_vm_op_fault() has a
> BUG() and so far hard-coded into core mm.  But this should still be better,
> because at least hugetlbfs is still always involved in feature probing
> (e.g. where it used to not support ZEROPAGE and we have a hard-coded line
> to fail that, and some more).  Meanwhile after this series, shmem will be
> completely converted to the new vm_uffd_ops API; the final vm_uffd_ops for
> shmem looks like this:
>
> static const vm_uffd_ops shmem_uffd_ops = {
> 	.uffd_features	= 	__VM_UFFD_FLAGS,
> 	.uffd_ioctls	= 	BIT(_UFFDIO_COPY) |
> 				BIT(_UFFDIO_ZEROPAGE) |
> 				BIT(_UFFDIO_WRITEPROTECT) |
> 				BIT(_UFFDIO_CONTINUE) |
> 				BIT(_UFFDIO_POISON),
> 	.uffd_get_folio	=	shmem_uffd_get_folio,
> 	.uffd_copy	=	shmem_mfill_atomic_pte,
> };
>
> As I mentioned in one of my reply to Nikita, I don't like the current
> interface of uffd_copy(), but this will be the minimum change version of
> such API to support complete extrenal-module-ready userfaultfd.  Here, very
> minimal change will be needed from shmem side to support that.

Right, maybe a better version of this interface might address some of my
concerns... :)

>
> Meanwhile, the vm_uffd_ops is also not the only place one will need to
> provide to support userfaultfd.  Normally vm_ops.fault() will also need to
> be updated, but that's a generic function and it'll play together with the
> new vm_uffd_ops to make everything fly.
>
> No functional change expected at all after the whole series applied.  There
> might be some slightly stricter check on uffd ops here and there in the
> last patch, but that really shouldn't stand out anywhere to anyone.
>
> For testing: besides the cross-compilation tests, I did also try with
> uffd-stress in a VM to measure any perf difference before/after the change;
> The static call becomes a pointer now.  I really cannot measure anything
> different, which is more or less expected.
>
> Comments welcomed, thanks.
>
> Peter Xu (4):
>   mm: Introduce vm_uffd_ops API
>   mm/shmem: Support vm_uffd_ops API
>   mm/hugetlb: Support vm_uffd_ops API
>   mm: Apply vm_uffd_ops API to core mm
>
>  include/linux/mm.h            |   9 +++
>  include/linux/shmem_fs.h      |  14 -----
>  include/linux/userfaultfd_k.h |  98 +++++++++++++++++++----------
>  mm/hugetlb.c                  |  19 ++++++
>  mm/shmem.c                    |  28 ++++++++-
>  mm/userfaultfd.c              | 115 +++++++++++++++++++++++++---------
>  6 files changed, 207 insertions(+), 76 deletions(-)
>
> --
> 2.49.0
>

Sorry to be critical, I just want to make sure we're not setting ourselves up
for trouble here.

I _very much_ support efforts to make uffd more generalised, and ideally to find
a way to separate out shmem and hugetlbfs implementation bits, so I support the
intent _fully_.

I just want to make sure we do it in a safe way :)

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ