linux-kernel - Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALCETrUOj=4VWp=B=QT0BQ8X_Ds_b+pt68oDwfjGb+K0StXmWA@mail.gmail.com>
Date:   Sat, 11 May 2019 15:39:45 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Jann Horn <jannh@...gle.com>, Andy Lutomirski <luto@...nel.org>,
        Aleksa Sarai <cyphar@...har.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Jeff Layton <jlayton@...nel.org>,
        "J. Bruce Fields" <bfields@...ldses.org>,
        Arnd Bergmann <arnd@...db.de>,
        David Howells <dhowells@...hat.com>,
        Eric Biederman <ebiederm@...ssion.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Christian Brauner <christian@...uner.io>,
        Tycho Andersen <tycho@...ho.ws>,
        David Drysdale <drysdale@...gle.com>,
        Chanho Min <chanho.min@....com>,
        Oleg Nesterov <oleg@...hat.com>, Aleksa Sarai <asarai@...e.de>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

> On May 11, 2019, at 10:21 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
>> On Sat, May 11, 2019 at 1:00 PM Andy Lutomirski <luto@...capital.net> wrote:
>>
>> A better “spawn” API should fix this.
>
> Andy, stop with the "spawn would be better".

It doesn’t have to be spawn per se.  But the current situation sucks.

>
> Notice? None of the real problems are about execve or would be solved
> by any spawn API. You just think that because you've apparently been
> talking to too many MS people that think fork (and thus indirectly
> execve()) is bad process management.
>
>

I’ve literally never spoken to an MS person about it.

What container managers and init systems *want* is a way to drop
privileges, change namespaces, etc and then run something in a
controlled way so that the intermediate states aren’t dangerous. An
API for this could be spawn-like or exec-like — that particular
distinction is beside the point.  Having personally written code that
mucks with namepsaces, I've wanted two particular abilities that are
both quite awkward:

a) Change all my UIDs and GIDs to match a container, enter that
container's namespaces, and run some binary in the container's
filesystem, all atomically enough that I don't need to worry about
accidentally leaking privileges into the container.  A
super-duper-non-dumpable mode would kind of allow this, but I'd worry
that there's some other hole besides ptrace() and /proc/self.

b) Change all my UIDs and GIDs to match a container, enter that
container's namespaces, and run some binary that is *not* in the
container's filesystem.  This happens, for example, if the container's
mount namespace has no exec mounts at all.  We don't have a fantastic
way to do this at all right now due to /proc/self/exe.

Regardless, the actual CVE at hand would have been nicely avoided if
writing to /proc/self/exe didn’t work, and I see no reason we can’t
make that happen.

I suppose we could also consider a change to disable /proc/self/exe if
it's not reachable from /proc/self/root.  By "disable", I mean that
readlink() should maybe still work, but actually trying to open it
could probably fail safely.