[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230130100602.elyvs6oorfzukjwh@wittgenstein>
Date: Mon, 30 Jan 2023 11:06:02 +0100
From: Christian Brauner <brauner@...nel.org>
To: Colin Walters <walters@...bum.org>
Cc: Giuseppe Scrivano <gscrivan@...hat.com>,
Aleksa Sarai <cyphar@...har.com>, linux-kernel@...r.kernel.org,
Kees Cook <keescook@...omium.org>, bristot@...hat.com,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Al Viro <viro@...iv.linux.org.uk>,
Alexander Larsson <alexl@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, bmasney@...hat.com
Subject: Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl
On Mon, Jan 30, 2023 at 10:53:31AM +0100, Christian Brauner wrote:
> On Sun, Jan 29, 2023 at 01:12:45PM -0500, Colin Walters wrote:
> >
> >
> > On Sun, Jan 29, 2023, at 11:58 AM, Christian Brauner wrote:
> > > On Sun, Jan 29, 2023 at 08:59:32AM -0500, Colin Walters wrote:
> > >>
> > >>
> > >> On Wed, Jan 25, 2023, at 11:30 AM, Giuseppe Scrivano wrote:
> > >> >
> > >> > After reading some comments on the LWN.net article, I wonder if
> > >> > PR_HIDE_SELF_EXE should apply to CAP_SYS_ADMIN in the initial user
> > >> > namespace or if in this case root should keep the privilege to inspect
> > >> > the binary of a process. If a container runs with that many privileges
> > >> > then it has already other ways to damage the host anyway.
> > >>
> > >> Right, that's what I was trying to express with the "make it work the same as map_files". Hiding the entry entirely even for initial-namespace-root (real root) seems like it's going to potentially confuse profiling/tracing/debugging tools for no good reason.
> > >
> > > If this can be circumvented via CAP_SYS_ADMIN
> >
> > To be clear, I'm proposing CAP_SYS_ADMIN in the current user namespace at the time of the prctl(). (Or if keeping around a reference just for this is too problematic, perhaps hardcoding to the init ns)
>
> Oh no, I fully understand. The point was that the userspace fix protects
> even against attackers with CAP_SYS_ADMIN in init_user_ns. And that was
> important back then and is still relevant today for some workloads.
>
> For unprivileged containers where host and container are separate by a
> meaningful user namespace boundary this whole mitigation is irrelevant
> as the binary can't be overwritten.
>
> >
> > A process with CAP_SYS_ADMIN in a child namespace would still not be able to read the binary.
> >
> > > then this mitigation
> > > becomes immediately way less interesting because the userspace
> > > mitigation we came up with protects against CAP_SYS_ADMIN as well
> > > without any regression risk.
> >
> > The userspace mitigation here being "clone self to memfd"? But that's a sufficiently ugly workaround that it's created new problems; see https://lwn.net/Articles/918106/
>
> But this is a problem with the memfd api not with the fix. Following the
> thread the ability to create executable memfds will stay around. As it
> should be given how long this has been supported. And they have backward
> compatibility in mind which is great.
Following up from yesterday's promise to check with the criu org I'm
part of: this is going to break criu unforunately as it dumps (and
restores) /proc/self/exe. Even with an escape hatch we'd still risk
breaking it. Whereas again, the memfd solution doesn't cause those
issues.
Don't get me wrong it's pretty obvious that I was pretty supportive of
this fix especially because it looked rather simple but this is turning
out to be less simple than we tought. I don't think that this is worth
it given the functioning fixes we already have.
The good thing is that - even if it will take a longer - that Aleksa's
patchset will provide a more general solution by making it possible for
runc/crun/lxc to open the target binary with a restricted upgrade mask
making it impossible to open the binary read-write again. This won't
break criu and will fix this issue and is generally useful.
Powered by blists - more mailing lists