[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z1n8jOmH3nxXn7du@google.com>
Date: Wed, 11 Dec 2024 12:56:44 -0800
From: Isaac Manjarres <isaacmanjarres@...gle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Jeff Layton <jlayton@...nel.org>,
Chuck Lever <chuck.lever@...cle.com>,
Alexander Aring <alex.aring@...il.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Shuah Khan <shuah@...nel.org>, kernel-team@...roid.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Suren Baghdasaryan <surenb@...gle.com>,
Kalesh Singh <kaleshsingh@...gle.com>,
John Stultz <jstultz@...gle.com>
Subject: Re: [RFC PATCH v1 1/2] mm/memfd: Add support for F_SEAL_FUTURE_EXEC
to memfd
On Fri, Dec 06, 2024 at 09:14:58PM +0000, Lorenzo Stoakes wrote:
> On Fri, Dec 06, 2024 at 12:48:09PM -0800, Isaac Manjarres wrote:
> > On Fri, Dec 06, 2024 at 06:19:49PM +0000, Lorenzo Stoakes wrote:
> > > On Thu, Dec 05, 2024 at 05:09:22PM -0800, Isaac J. Manjarres wrote:
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index b1b2a24ef82e..c7b96b057fda 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -375,6 +375,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > > > if (!file_mmap_ok(file, inode, pgoff, len))
> > > > return -EOVERFLOW;
> > > >
> > >
> > > Not maybe in favour of _where_ in the logic we check this and definitely
> > > not in expanding this do_mmap() stuff much further.
> > >
> > > See comment at bottom though... I have a cunning plan :)
> > >
> > > > + if (is_exec_sealed(seals)) {
> > >
> > > Are we intentionally disallowing a MAP_PRIVATE memfd's mapping's execution?
> > > I've not tested this scenario so don't know if we somehow disallow this in
> > > another way but note on write checks we only care about shared mappings.
> > >
> > > I mean one could argue that a MAP_PRIVATE situation is the same as copying
> > > the data into an anon buffer and doing what you want with it, here you
> > > could argue the same...
> > >
> > > So probably we should only care about VM_SHARED?
> >
> > Thanks for taking a look at this!
> >
> > I'd originally implemented it for just the VM_SHARED case, but after
> > discussing it with Kalesh, I changed it to disallow executable
> > mappings for both MAP_SHARED and MAP_PRIVATE.
> >
> > Our thought was that write sealing didn't apply in the MAP_PRIVATE case
> > to support COW with MAP_PRIVATE. There's nothing similar to COW with
> > execution, so I decided to prevent it for both cases; it also retains
> > the same behavior as the ashmem driver.
>
> Hm, yeah I'm not sure that's really justified, I mean what's to stop a
> caller from just mapping their own memory mapping executable, copying the
> data and executing?
>
That's a fair point. In that case, I think it makes sense to enforce the
seal only when the mapping is shared.
The case I'm trying to address is when a process (A) allocates a memfd
that is meant to be read and written by itself and another process (B).
A shares the buffer with B, but B injects code into the buffer, and
compromises A such that A maps the buffer with PROT_EXEC and runs the
code that B injected into it.
If A used F_SEAL_FUTURE_EXEC prior to sharing the buffer, then it could
reduce the attack surface on itself in this scenario.
> There's also further concerns around execution restriction for instance in
> memfd_add_seals():
>
> /*
> * SEAL_EXEC implys SEAL_WRITE, making W^X from the start.
> */
> if (seals & F_SEAL_EXEC && inode->i_mode & 0111)
> seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE;
>
> So you probably want to change this to include F_SEAL_FUTURE_EXEC, and note
Do you mean adding a case where if F_SEAL_FUTURE_EXEC is in the seals,
then we should clear the X bits of the file and use F_SEAL_EXEC as well?
I don't think the case in the if condition should imply F_SEAL_FUTURE_EXEC,
since the file is still executable in this case?
> your proposal interacts negatively with the whole
> MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED mode set in vm.memfd_noeec - any system
> with this set to '2' will literally not allow you to do what you want if
> set to 2.
>
> See https://origin.kernel.org/doc/html/latest/userspace-api/mfd_noexec.html
Sorry, I didn't follow how this would impact
MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED. Could you please clarify that?
> > Thanks again for reviewing these patches! Happy that I was able to get
> > the gears turning :)
> >
> > I'm really interested in helping with this, so is there any forum you'd
> > like to use for collaborating on this or any way I can help?
> >
> > I'm also more than happy to test out any patches that you'd like!
>
> Thanks, I'm just going to post to the mailing list, this is the discussion
> forum I'm making use of for this :)
>
> I will cc- you on my patch and definitely I'd appreciate testing thanks!
>
> But yeah, to be clear I'm not done with reviewing this, I need more time to
> digest what you're trying to do here, but you definitely need to think
> about the exec limitations.
Thanks for sending out the patch. I took a look and tested it out and it
definitely makes implementing this a lot nicer!
Thanks,
Isaac
Powered by blists - more mailing lists