lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez39-GaWbLYX77ZTJFW5oe0V7GS5MEQUaKSjYeCEDM0-vg@mail.gmail.com>
Date: Thu, 28 Nov 2024 19:27:44 +0100
From: Jann Horn <jannh@...gle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Julian Orth <ju.orth@...il.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Liam R . Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Shuah Khan <shuah@...nel.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd
 mappings read-only

On Thu, Nov 28, 2024 at 7:21 PM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
> On Thu, Nov 28, 2024 at 06:58:21PM +0100, Julian Orth wrote:
> > (Re-sending the message below since I forgot to reply-all)
> >
> > On Thu, Nov 28, 2024 at 6:46 PM Jann Horn <jannh@...gle.com> wrote:
> > >
> > > On Thu, Nov 28, 2024 at 4:06 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@...cle.com> wrote:
> > > > In commit 158978945f31 ("mm: perform the mapping_map_writable() check after
> > > > call_mmap()") (and preceding changes in the same series) it became possible
> > > > to mmap() F_SEAL_WRITE sealed memfd mappings read-only.
> > > >
> > > > This was previously unnecessarily disallowed, despite the man page
> > > > documentation indicating that it would be, thereby limiting the usefulness
> > > > of F_SEAL_WRITE logic.
> > > >
> > > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE
> > > > seal (one which disallows future writes to the memfd) to also be used for
> > > > F_SEAL_WRITE.
> > > >
> > > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a
> > > > read-only mapping to disallow mprotect() from overriding the seal - an
> > > > operation performed by seal_check_write(), invoked from shmem_mmap(), the
> > > > f_op->mmap() hook used by shmem mappings.
> > > >
> > > > By extending this to F_SEAL_WRITE and critically - checking
> > > > mapping_map_writable() to determine if we may map the memfd AFTER we invoke
> > > > shmem_mmap() - the desired logic becomes possible. This is because
> > > > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will
> > > > have cleared.
> > > >
> > > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
> > > > behaviour") unintentionally undid this logic by moving the
> > > > mapping_map_writable() check before the shmem_mmap() hook is invoked,
> > > > thereby regressing this change.
> > > >
> > > > We reinstate this functionality by moving the check out of shmem_mmap() and
> > > > instead performing it in do_mmap() at the point at which VMA flags are
> > > > being determined, which seems in any case to be a more appropriate place in
> > > > which to make this determination.
> > > >
> > > > In order to achieve this we rework memfd seal logic to allow us access to
> > > > this information using existing logic and eliminate the clearing of
> > > > VM_MAYWRITE from seal_check_write() which we are performing in do_mmap()
> > > > instead.
> > >
> > > If we already check is_readonly_sealed() and strip VM_MAYWRITE in
> > > do_mmap(), without holding any kind of lock or counter on the file
> > > yet, then this check is clearly racy somehow, right? I think we have a
> > > race where we intermittently reject shared-readonly mmap() calls?
> >
> > Apropos race, some time ago I reported a way to get a mutable mapping
> > for a write-sealed memfd via a race:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=219106
>
> Kind of hard to read rust code, but it looks like you're intentionally
> trying to race sealing on the assumption it's atomic when it's not? That
> doesn't seem like a bug?
>
> The intent of sealing memfds is you establish the memfd buffer, then seal
> it and _only then_ expose it elsewhere.
>
> I may be missing something here, however.

AFAIU memfds are supposed to also guarantee *to the recipient* of the
shared memfd that the memory inside it won't change anymore, so that
the recipient can parse data out of this shared memory buffer without
having to worry about the data concurrently changing. udmabuf_create()
looks like it indeed breaks that assumption by first calling
check_memfd_seals() and then doing udmabuf_pin_folios() without any
lock that prevents a seal being added in between those.

That's also why we have memfd_wait_for_pins(), which ensures that
folios in the memfd don't have elevated refcounts when a F_SEAL_WRITE
seal is added.

(I believe that's one of the major differences in usecases of
F_SEAL_WRITE and F_SEAL_FUTURE_WRITE: F_SEAL_FUTURE_WRITE is enough
for cases where only the creator of the memfd wants to prevent other
tasks from writing into it, while F_SEAL_WRITE is suitable for cases
where the creator and recipient of the memfd want mutual protection.)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ