[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABi2SkWoNTd5sRJ7-7arPfutYZx6xi9iac0mXZyfzuVXuh1atA@mail.gmail.com>
Date: Fri, 17 Jan 2025 10:20:20 -0800
From: Jeff Xu <jeffxu@...omium.org>
To: Pedro Falcato <pedro.falcato@...il.com>
Cc: Benjamin Berg <benjamin@...solutions.net>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Kees Cook <kees@...nel.org>, akpm@...ux-foundation.org, jannh@...gle.com,
torvalds@...ux-foundation.org, adhemerval.zanella@...aro.org, oleg@...hat.com,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org,
linux-mm@...ck.org, jorgelo@...omium.org, sroettger@...gle.com,
ojeda@...nel.org, adobriyan@...il.com, anna-maria@...utronix.de,
mark.rutland@....com, linus.walleij@...aro.org, Jason@...c4.com,
deller@....de, rdunlap@...radead.org, davem@...emloft.net, hch@....de,
peterx@...hat.com, hca@...ux.ibm.com, f.fainelli@...il.com, gerg@...nel.org,
dave.hansen@...ux.intel.com, mingo@...nel.org, ardb@...nel.org,
Liam.Howlett@...cle.com, mhocko@...e.com, 42.hyeyoo@...il.com,
peterz@...radead.org, ardb@...gle.com, enh@...gle.com, rientjes@...gle.com,
groeck@...omium.org, mpe@...erman.id.au, Vlastimil Babka <vbabka@...e.cz>,
Andrei Vagin <avagin@...il.com>, Dmitry Safonov <0x7f454c46@...il.com>,
Mike Rapoport <mike.rapoport@...il.com>,
Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
Subject: Re: [PATCH v4 1/1] exec: seal system mappings
On Thu, Jan 16, 2025 at 9:18 AM Pedro Falcato <pedro.falcato@...il.com> wrote:
>
> On Thu, Jan 16, 2025 at 5:02 PM Benjamin Berg <benjamin@...solutions.net> wrote:
> >
> > Hi Lorenzo,
> >
> > On Thu, 2025-01-16 at 15:48 +0000, Lorenzo Stoakes wrote:
> > > On Wed, Jan 15, 2025 at 12:20:59PM -0800, Jeff Xu wrote:
> > > > On Wed, Jan 15, 2025 at 11:46 AM Lorenzo Stoakes
> > > > <lorenzo.stoakes@...cle.com> wrote:
> > >
> > > [SNIP]
> > > >
> > > > > I've made it abundantly clear that this (NACKed) series cannot allow the
> > > > > kernel to be in a broken state even if a user sets flags to do so.
> > > > >
> > > > > This is because users might lack context to make this decision and
> > > > > incorrectly do so, and now we ship a known-broken kernel.
> > > > >
> > > > > You are now suggesting disabling the !CRIU requirement. Which violates my
> > > > > _requirements_ (not optional features).
> > > > >
> > > > Sure, I can add CRIU back.
> > > >
> > > > Are you fine with UML and gViso not working under this CONFIG ?
> > > > UML/gViso doesn't use any KCONFIG like CRIU does.
> > >
> > > Yeah this is a concern, wouldn't we be able to catch UML with a flag?
> > >
> > > Apologies my fault for maybe not being totally up to date with this, but what
> > > exactly was the gViso (is it gVisor actually?)
> >
> > UML is a separate architecture. It is a Linux kernel running as a
> > userspace application on top of an unmodified host kernel.
> >
> > So really, UML is a mostly weird userspace program for the purpose of
> > this discussion. And a pretty buggy one too--it got broken by rseq
> > already.
> >
> > What UML now does is:
> > * Execute a tiny static binary
> > * map special "stub" code/data pages at the topmost userspace address
> > (replacing its stack)
> > * continue execution inside the "stub" pages
> > * unmap everything below the "stub" pages
> > * use the unmap'ed area for userspace application mappings
> >
> > I believe that the "unmap everything" step will fail with this feature.
> >
> >
> > Now, I am sure one can come up with solutions, e.g.:
> > 1. Simply print an explanation if the unmap() fails
> > 2. Find an address that is guaranteed to be below the VDSO and use a
> > smaller address space for the UML userspace.
> > 3. Somehow tell the host kernel to not install the VDSO mappings
> > 4. Add the host VDSO pages as a sealed VMA within UML to guard them
> >
> > UML is a bit of a niche and I am not sure it is worth worrying about it
> > too much.
>
> I've been absent from this patch series in general, but this gave me
> an idea: what if we let userspace seal these mappings itself? Since
> glibc is already sealing things, it might as well seal these?
> And then systems that _do_ care about this would set the glibc tunable
> and deal with the breakage.
>
> Is there something seriously wrong with this approach? Besides maybe
> not having a super easy way to discover these mappings atm, I feel
> like it would solve all of the policy issues people have been talking
> about in these threads.
>
There are technical difficulties to seal vdso/vvar from the glibc
side. The dynamic linker lacks vdso/vvar mapping size information, and
architectural variations for vdso/vvar also means sealing from the
kernel side is a simpler solution. Adhemerval has more details in case
clarification is needed from the glibc side.
Additionally, uprobe mapping can't be sealed by the dynamic linker,
dynamic linker can only apply sealing during execve() and dlopen(),
uprobe mapping isn't created during those two calls.
-Jeff
> --
> Pedro
Powered by blists - more mailing lists