[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <36882f0d-4a0a-4877-8c42-ea807e4d3f6d@lucifer.local>
Date: Thu, 28 Nov 2024 18:27:31 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Julian Orth <ju.orth@...il.com>, Andrew Morton <akpm@...ux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Shuah Khan <shuah@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd
mappings read-only
On Thu, Nov 28, 2024 at 07:18:20PM +0100, Jann Horn wrote:
> On Thu, Nov 28, 2024 at 7:05 PM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> > On Thu, Nov 28, 2024 at 06:45:46PM +0100, Jann Horn wrote:
> > > On Thu, Nov 28, 2024 at 4:06 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@...cle.com> wrote:
> > > > In commit 158978945f31 ("mm: perform the mapping_map_writable() check after
> > > > call_mmap()") (and preceding changes in the same series) it became possible
> > > > to mmap() F_SEAL_WRITE sealed memfd mappings read-only.
> > > >
> > > > This was previously unnecessarily disallowed, despite the man page
> > > > documentation indicating that it would be, thereby limiting the usefulness
> > > > of F_SEAL_WRITE logic.
> > > >
> > > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE
> > > > seal (one which disallows future writes to the memfd) to also be used for
> > > > F_SEAL_WRITE.
> > > >
> > > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a
> > > > read-only mapping to disallow mprotect() from overriding the seal - an
> > > > operation performed by seal_check_write(), invoked from shmem_mmap(), the
> > > > f_op->mmap() hook used by shmem mappings.
> > > >
> > > > By extending this to F_SEAL_WRITE and critically - checking
> > > > mapping_map_writable() to determine if we may map the memfd AFTER we invoke
> > > > shmem_mmap() - the desired logic becomes possible. This is because
> > > > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will
> > > > have cleared.
> > > >
> > > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
> > > > behaviour") unintentionally undid this logic by moving the
> > > > mapping_map_writable() check before the shmem_mmap() hook is invoked,
> > > > thereby regressing this change.
> > > >
> > > > We reinstate this functionality by moving the check out of shmem_mmap() and
> > > > instead performing it in do_mmap() at the point at which VMA flags are
> > > > being determined, which seems in any case to be a more appropriate place in
> > > > which to make this determination.
> > > >
> > > > In order to achieve this we rework memfd seal logic to allow us access to
> > > > this information using existing logic and eliminate the clearing of
> > > > VM_MAYWRITE from seal_check_write() which we are performing in do_mmap()
> > > > instead.
> > >
> > > If we already check is_readonly_sealed() and strip VM_MAYWRITE in
> > > do_mmap(), without holding any kind of lock or counter on the file
> > > yet, then this check is clearly racy somehow, right? I think we have a
> > > race where we intermittently reject shared-readonly mmap() calls?
> > >
> > > Like:
> > >
> > > process 1: calls mmap(PROT_READ, MAP_PRIVATE), checks is_readonly_sealed()
> > > process 2: adds a F_SEAL_WRITE seal
> > > process 1: enters mmap_region(), is_shared_maywrite() is true,
> > > mapping_map_writable() fails
> >
> > I don't think this matters? Firstly these would have to be threads unless you
> > are going out of your way to transmit the memfd incompletely set up via a socket
> > or something, and then you'd have to be doing it on the assumption that it
> > wouldn't race?
>
> Ah, I guess that's true.
>
> > The whole purpose of this change is to _allow read-only mapping *at all*_. Not
> > to avoid silly races that are the product of somebody doing stupid things.
> >
> > >
> > > But even if we fix that, the same scenario would result in
> > > F_SEAL_WRITE randomly failing depending on the ordering, so it's not
> > > like we can actually do anything particularly sensible if these two
> > > operations race. Taking a step back, read-only shared mappings of
> > > F_SEAL_WRITE-sealed files are just kind of a bad idea because if
> > > someone first creates a read-only shared mapping and *then* tries to
> > > apply F_SEAL_WRITE, that won't work because the existing mapping will
> > > be VM_MAYWRITE.
> >
> > I don't think so?
> >
> > If they try to do that, attempting to apply the seal will fail as write will be
> > disallowed. So there's no risk of overriding the seal.
> >
> > The idea is you establish a buffer, write into it, unmap, write-seal, and now
> > you can mmap() it PROT_READ.
> >
> > Obviously it's not sensible (or really probably sensibly feasible) to try to
> > find every VMA that has it opened VM_READ | VM_MAYWRITE and clear the
> > VM_MAYWRITE, so instead we simply disallow that scenario.
> >
> > But it's totally reasonable to be able to mmap() it readonly afterwards.
> >
> > >
> > > And the manpage is just misleading on interaction with shared mappings
> > > in general, it says "Using the F_ADD_SEALS operation to set the
> > > F_SEAL_WRITE seal fails with EBUSY if any writable, shared mapping
> > > exists" when actually, it more or less fails if any shared mapping
> > > exists at all.
> >
> > No, it's when any writable mapping exists after my changes :) but people
> > might not be quite aware of the distinction between VM_MAYWRITE and
> > VM_WRITE.
>
> To clarify, do you read "writable" as "VM_MAYWRITE|VM_SHARED"?
Yeah. It's a nuanced thing, and I don't _think_ there's any way to mmap without
this so yeah in a way you're right, _except_ that I explicitly make this
possible.
It also protects you from mprotect() later making a F_SEAL_FUTURE_WRITE mapping
mmap()'d PROT_READ from being mprotect()'d PROT_READ | PROT_WRITE.
I really get the feeling that people felt F_SEAL_WRITE didn't work the way
anybody expected, so implemented F_SEAL_FUTURE_WRITE instead, which doesn't use
the address_space atomics mapping_map_writable() et al. use (since it doesn't
have the 'disallow if anything writably mapped now' semantics).
Not being able to mmap() read after applying the F_SEAL_WRITE really limits its
usefulness if that's how you want to access it too.
So this basically is a convenience thing just to let people be able to do that
as well as read()'ing data.
>
> > > @Julian Orth: Did you report this regression because this change
> > > caused issues with existing userspace code?
> > >
> > > > Reported-by: Julian Orth <ju.orth@...il.com>
> > > > Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/
> > > > Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour")
> > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> >
> > In any case, we are not discussing my original patch in 6.6 that permitted
> > this behaviour, whether you agree or disagree it was sensible, we have
> > regressed user-visible behaviour, this change restores it.
>
> Hm, yeah, you're right.
Yeah, I mean at the end of the day, unless we explicitly want to just get rid of
this feature here and possibly break userspace (probably very few people
affected to be honest but as Julian demonstrates at least some), we pretty much
have to do this.
I think this implementation is a less awful way of doing it than alternatives.
BTW we do at least fget() the file before we interact with the file object (I
mean mmap would be broken if we didn't), though at no point were we protected
from a racing seal being applied.
In the end, most users will want F_SEAL_FUTURE_WRITE anyway to be honest :)
Powered by blists - more mailing lists