linux-kernel - Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd mappings read-only

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez16vhr_D1o_3tMetCWq3NiGm__PAECyprdePtq40ag1gw@mail.gmail.com>
Date: Thu, 28 Nov 2024 19:18:20 +0100
From: Jann Horn <jannh@...gle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Julian Orth <ju.orth@...il.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Liam R . Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Shuah Khan <shuah@...nel.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd
 mappings read-only

On Thu, Nov 28, 2024 at 7:05 PM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
> On Thu, Nov 28, 2024 at 06:45:46PM +0100, Jann Horn wrote:
> > On Thu, Nov 28, 2024 at 4:06 PM Lorenzo Stoakes
> > <lorenzo.stoakes@...cle.com> wrote:
> > > In commit 158978945f31 ("mm: perform the mapping_map_writable() check after
> > > call_mmap()") (and preceding changes in the same series) it became possible
> > > to mmap() F_SEAL_WRITE sealed memfd mappings read-only.
> > >
> > > This was previously unnecessarily disallowed, despite the man page
> > > documentation indicating that it would be, thereby limiting the usefulness
> > > of F_SEAL_WRITE logic.
> > >
> > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE
> > > seal (one which disallows future writes to the memfd) to also be used for
> > > F_SEAL_WRITE.
> > >
> > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a
> > > read-only mapping to disallow mprotect() from overriding the seal - an
> > > operation performed by seal_check_write(), invoked from shmem_mmap(), the
> > > f_op->mmap() hook used by shmem mappings.
> > >
> > > By extending this to F_SEAL_WRITE and critically - checking
> > > mapping_map_writable() to determine if we may map the memfd AFTER we invoke
> > > shmem_mmap() - the desired logic becomes possible. This is because
> > > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will
> > > have cleared.
> > >
> > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
> > > behaviour") unintentionally undid this logic by moving the
> > > mapping_map_writable() check before the shmem_mmap() hook is invoked,
> > > thereby regressing this change.
> > >
> > > We reinstate this functionality by moving the check out of shmem_mmap() and
> > > instead performing it in do_mmap() at the point at which VMA flags are
> > > being determined, which seems in any case to be a more appropriate place in
> > > which to make this determination.
> > >
> > > In order to achieve this we rework memfd seal logic to allow us access to
> > > this information using existing logic and eliminate the clearing of
> > > VM_MAYWRITE from seal_check_write() which we are performing in do_mmap()
> > > instead.
> >
> > If we already check is_readonly_sealed() and strip VM_MAYWRITE in
> > do_mmap(), without holding any kind of lock or counter on the file
> > yet, then this check is clearly racy somehow, right? I think we have a
> > race where we intermittently reject shared-readonly mmap() calls?
> >
> > Like:
> >
> > process 1: calls mmap(PROT_READ, MAP_PRIVATE), checks is_readonly_sealed()
> > process 2: adds a F_SEAL_WRITE seal
> > process 1: enters mmap_region(), is_shared_maywrite() is true,
> > mapping_map_writable() fails
>
> I don't think this matters? Firstly these would have to be threads unless you
> are going out of your way to transmit the memfd incompletely set up via a socket
> or something, and then you'd have to be doing it on the assumption that it
> wouldn't race?

Ah, I guess that's true.

> The whole purpose of this change is to _allow read-only mapping *at all*_. Not
> to avoid silly races that are the product of somebody doing stupid things.
>
> >
> > But even if we fix that, the same scenario would result in
> > F_SEAL_WRITE randomly failing depending on the ordering, so it's not
> > like we can actually do anything particularly sensible if these two
> > operations race. Taking a step back, read-only shared mappings of
> > F_SEAL_WRITE-sealed files are just kind of a bad idea because if
> > someone first creates a read-only shared mapping and *then* tries to
> > apply F_SEAL_WRITE, that won't work because the existing mapping will
> > be VM_MAYWRITE.
>
> I don't think so?
>
> If they try to do that, attempting to apply the seal will fail as write will be
> disallowed. So there's no risk of overriding the seal.
>
> The idea is you establish a buffer, write into it, unmap, write-seal, and now
> you can mmap() it PROT_READ.
>
> Obviously it's not sensible (or really probably sensibly feasible) to try to
> find every VMA that has it opened VM_READ | VM_MAYWRITE and clear the
> VM_MAYWRITE, so instead we simply disallow that scenario.
>
> But it's totally reasonable to be able to mmap() it readonly afterwards.
>
> >
> > And the manpage is just misleading on interaction with shared mappings
> > in general, it says "Using the F_ADD_SEALS operation to set the
> > F_SEAL_WRITE seal fails with EBUSY if any writable, shared mapping
> > exists" when actually, it more or less fails if any shared mapping
> > exists at all.
>
> No, it's when any writable mapping exists after my changes :) but people
> might not be quite aware of the distinction between VM_MAYWRITE and
> VM_WRITE.

To clarify, do you read "writable" as "VM_MAYWRITE|VM_SHARED"?

> > @Julian Orth: Did you report this regression because this change
> > caused issues with existing userspace code?
> >
> > > Reported-by: Julian Orth <ju.orth@...il.com>
> > > Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/
> > > Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour")
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> In any case, we are not discussing my original patch in 6.6 that permitted
> this behaviour, whether you agree or disagree it was sensible, we have
> regressed user-visible behaviour, this change restores it.

Hm, yeah, you're right.