[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230130095324.p2gnsvdnpgfehgqt@wittgenstein>
Date: Mon, 30 Jan 2023 10:53:24 +0100
From: Christian Brauner <brauner@...nel.org>
To: Colin Walters <walters@...bum.org>
Cc: Giuseppe Scrivano <gscrivan@...hat.com>,
Aleksa Sarai <cyphar@...har.com>, linux-kernel@...r.kernel.org,
Kees Cook <keescook@...omium.org>, bristot@...hat.com,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Al Viro <viro@...iv.linux.org.uk>,
Alexander Larsson <alexl@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, bmasney@...hat.com
Subject: Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl
On Sun, Jan 29, 2023 at 01:12:45PM -0500, Colin Walters wrote:
>
>
> On Sun, Jan 29, 2023, at 11:58 AM, Christian Brauner wrote:
> > On Sun, Jan 29, 2023 at 08:59:32AM -0500, Colin Walters wrote:
> >>
> >>
> >> On Wed, Jan 25, 2023, at 11:30 AM, Giuseppe Scrivano wrote:
> >> >
> >> > After reading some comments on the LWN.net article, I wonder if
> >> > PR_HIDE_SELF_EXE should apply to CAP_SYS_ADMIN in the initial user
> >> > namespace or if in this case root should keep the privilege to inspect
> >> > the binary of a process. If a container runs with that many privileges
> >> > then it has already other ways to damage the host anyway.
> >>
> >> Right, that's what I was trying to express with the "make it work the same as map_files". Hiding the entry entirely even for initial-namespace-root (real root) seems like it's going to potentially confuse profiling/tracing/debugging tools for no good reason.
> >
> > If this can be circumvented via CAP_SYS_ADMIN
>
> To be clear, I'm proposing CAP_SYS_ADMIN in the current user namespace at the time of the prctl(). (Or if keeping around a reference just for this is too problematic, perhaps hardcoding to the init ns)
Oh no, I fully understand. The point was that the userspace fix protects
even against attackers with CAP_SYS_ADMIN in init_user_ns. And that was
important back then and is still relevant today for some workloads.
For unprivileged containers where host and container are separate by a
meaningful user namespace boundary this whole mitigation is irrelevant
as the binary can't be overwritten.
>
> A process with CAP_SYS_ADMIN in a child namespace would still not be able to read the binary.
>
> > then this mitigation
> > becomes immediately way less interesting because the userspace
> > mitigation we came up with protects against CAP_SYS_ADMIN as well
> > without any regression risk.
>
> The userspace mitigation here being "clone self to memfd"? But that's a sufficiently ugly workaround that it's created new problems; see https://lwn.net/Articles/918106/
But this is a problem with the memfd api not with the fix. Following the
thread the ability to create executable memfds will stay around. As it
should be given how long this has been supported. And they have backward
compatibility in mind which is great.
Powered by blists - more mailing lists