[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALCETrUOj=4VWp=B=QT0BQ8X_Ds_b+pt68oDwfjGb+K0StXmWA@mail.gmail.com>
Date: Sat, 11 May 2019 15:39:45 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Jann Horn <jannh@...gle.com>, Andy Lutomirski <luto@...nel.org>,
Aleksa Sarai <cyphar@...har.com>,
Al Viro <viro@...iv.linux.org.uk>,
Jeff Layton <jlayton@...nel.org>,
"J. Bruce Fields" <bfields@...ldses.org>,
Arnd Bergmann <arnd@...db.de>,
David Howells <dhowells@...hat.com>,
Eric Biederman <ebiederm@...ssion.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexei Starovoitov <ast@...nel.org>,
Kees Cook <keescook@...omium.org>,
Christian Brauner <christian@...uner.io>,
Tycho Andersen <tycho@...ho.ws>,
David Drysdale <drysdale@...gle.com>,
Chanho Min <chanho.min@....com>,
Oleg Nesterov <oleg@...hat.com>, Aleksa Sarai <asarai@...e.de>,
Linux Containers <containers@...ts.linux-foundation.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
kernel list <linux-kernel@...r.kernel.org>,
linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters
> On May 11, 2019, at 10:21 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
>> On Sat, May 11, 2019 at 1:00 PM Andy Lutomirski <luto@...capital.net> wrote:
>>
>> A better “spawn” API should fix this.
>
> Andy, stop with the "spawn would be better".
It doesn’t have to be spawn per se. But the current situation sucks.
>
> Notice? None of the real problems are about execve or would be solved
> by any spawn API. You just think that because you've apparently been
> talking to too many MS people that think fork (and thus indirectly
> execve()) is bad process management.
>
>
I’ve literally never spoken to an MS person about it.
What container managers and init systems *want* is a way to drop
privileges, change namespaces, etc and then run something in a
controlled way so that the intermediate states aren’t dangerous. An
API for this could be spawn-like or exec-like — that particular
distinction is beside the point. Having personally written code that
mucks with namepsaces, I've wanted two particular abilities that are
both quite awkward:
a) Change all my UIDs and GIDs to match a container, enter that
container's namespaces, and run some binary in the container's
filesystem, all atomically enough that I don't need to worry about
accidentally leaking privileges into the container. A
super-duper-non-dumpable mode would kind of allow this, but I'd worry
that there's some other hole besides ptrace() and /proc/self.
b) Change all my UIDs and GIDs to match a container, enter that
container's namespaces, and run some binary that is *not* in the
container's filesystem. This happens, for example, if the container's
mount namespace has no exec mounts at all. We don't have a fantastic
way to do this at all right now due to /proc/self/exe.
Regardless, the actual CVE at hand would have been nicely avoided if
writing to /proc/self/exe didn’t work, and I see no reason we can’t
make that happen.
I suppose we could also consider a change to disable /proc/self/exe if
it's not reachable from /proc/self/root. By "disable", I mean that
readlink() should maybe still work, but actually trying to open it
could probably fail safely.
Powered by blists - more mailing lists