linux-kernel - Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190511172604.znr7wa3iarlgzor4@yavin>
Date:   Sun, 12 May 2019 03:26:04 +1000
From:   Aleksa Sarai <cyphar@...har.com>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Jann Horn <jannh@...gle.com>, Andy Lutomirski <luto@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Jeff Layton <jlayton@...nel.org>,
        "J. Bruce Fields" <bfields@...ldses.org>,
        Arnd Bergmann <arnd@...db.de>,
        David Howells <dhowells@...hat.com>,
        Eric Biederman <ebiederm@...ssion.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Christian Brauner <christian@...uner.io>,
        Tycho Andersen <tycho@...ho.ws>,
        David Drysdale <drysdale@...gle.com>,
        Chanho Min <chanho.min@....com>,
        Oleg Nesterov <oleg@...hat.com>, Aleksa Sarai <asarai@...e.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

On 2019-05-11, Andy Lutomirski <luto@...capital.net> wrote:
> >> I've lost track of the context here, but it seems to me that
> >> mitigating attacks involving accidental following of /proc links
> >> shouldn't depend on dumpability.  What's the actual problem this is
> >> trying to solve again?
> > 
> > The one actual security problem that I've seen related to this is
> > CVE-2019-5736. There is a write-up of it at
> > <https://blog.dragonsector.pl/2019/02/cve-2019-5736-escape-from-docker-and.html>
> > under "Successful approach", but it goes more or less as follows:
> > 
> > A container is running that doesn't use user namespaces (because for
> > some reason I don't understand, apparently some people still do that).
> > An evil process is running inside the container with UID 0 (as in,
> > GLOBAL_ROOT_UID); so if the evil process inside the container was able
> > to reach root-owned files on the host filesystem, it could write into
> > them.
> > 
> > The container engine wants to spawn a new process inside the container.
> > It forks off a child that joins the container's namespaces (including
> > PID and mount namespaces), and then the child calls execve() on some
> > path in the container.
> 
> I think that, at this point, the task should be considered owned by
> the container.  Maybe we should have a better API than execve() to
> execute a program in a safer way, but fiddling with dumpability seems
> like a band-aid.  In fact, the process is arguably pwned even *before*
> execve.

Yeah, execve is just the vector (though in this case it's done in order
to clear mm->dumpable). An earlier CVE (CVE-2016-9962) was very similar
but was attacking a dirfd that runc had open into the container (LXC had
a very similar bug too) -- setting !mm->dumpable was one of the
workarounds we had for this.

> A better “spawn” API should fix this.  In the mean time, I think it
> should be assumed that, if you join a container’s namespaces, you are
> at its mercy.

This is generally how we treat containers as runtime authors, but it's
not a trivial thing to get right. In many cases the kernel APIs are
working against you -- Christian and myself have written a fair few
patches to fix holes in the kernel APIs so we can avoid these kinds of
assumptions.

But yes, one of the most risky parts of a container runtime is when
you're attaching to a running container because all of the helpful
introspection APIs in /proc/ suddenly become a security nightmare. A
better "spawn a process in these namespaces" API might help improve the
situation (or at least, I hope it would).

> > - You can use /proc/*/exe to get a writable fd.
> 
> This is IMO the real bug.

I will try to send an RFC of the patchset I have for this next week or
so. Funnily enough, currently /proc/*/exe has the write bit set in its
"mode" (my series fixes this).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)