lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 11 May 2019 10:00:47 -0700
From:   Andy Lutomirski <>
To:     Jann Horn <>
Cc:     Andy Lutomirski <>,
        Aleksa Sarai <>,
        Al Viro <>,
        Jeff Layton <>,
        "J. Bruce Fields" <>,
        Arnd Bergmann <>,
        David Howells <>,
        Eric Biederman <>,
        Andrew Morton <>,
        Alexei Starovoitov <>,
        Kees Cook <>,
        Christian Brauner <>,
        Tycho Andersen <>,
        David Drysdale <>,
        Chanho Min <>,
        Oleg Nesterov <>, Aleksa Sarai <>,
        Linus Torvalds <>,
        Linux Containers <>,
        linux-fsdevel <>,
        Linux API <>,
        kernel list <>,
        linux-arch <>
Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

> On May 10, 2019, at 3:55 PM, Jann Horn <> wrote:
>> On Fri, May 10, 2019 at 02:20:23PM -0700, Andy Lutomirski wrote:
>>> On Fri, May 10, 2019 at 1:41 PM Jann Horn <> wrote:
>>>> On Tue, May 07, 2019 at 05:17:35AM +1000, Aleksa Sarai wrote:
>>>>> On 2019-05-06, Jann Horn <> wrote:
>>>>> In my opinion, CVE-2019-5736 points out two different problems:
>>>>> The big problem: The __ptrace_may_access() logic has a special-case
>>>>> short-circuit for "introspection" that you can't opt out of; this
>>>>> makes it possible to open things in procfs that are related to the
>>>>> current process even if the credentials of the process wouldn't permit
>>>>> accessing another process like it. I think the proper fix to deal with
>>>>> this would be to add a prctl() flag for "set whether introspection is
>>>>> allowed for this process", and if userspace has manually un-set that
>>>>> flag, any introspection special-case logic would be skipped.
>>>> We could do PR_SET_DUMPABLE=3 for this, I guess?
>>> Hmm... I'd make it a new prctl() command, since introspection is
>>> somewhat orthogonal to dumpability. Also, dumpability is per-mm, and I
>>> think the introspection flag should be per-thread.
>> I've lost track of the context here, but it seems to me that
>> mitigating attacks involving accidental following of /proc links
>> shouldn't depend on dumpability.  What's the actual problem this is
>> trying to solve again?
> The one actual security problem that I've seen related to this is
> CVE-2019-5736. There is a write-up of it at
> <>
> under "Successful approach", but it goes more or less as follows:
> A container is running that doesn't use user namespaces (because for
> some reason I don't understand, apparently some people still do that).
> An evil process is running inside the container with UID 0 (as in,
> GLOBAL_ROOT_UID); so if the evil process inside the container was able
> to reach root-owned files on the host filesystem, it could write into
> them.
> The container engine wants to spawn a new process inside the container.
> It forks off a child that joins the container's namespaces (including
> PID and mount namespaces), and then the child calls execve() on some
> path in the container.

I think that, at this point, the task should be considered owned by the container.  Maybe we should have a better API than execve() to execute a program in a safer way, but fiddling with dumpability seems like a band-aid.  In fact, the process is arguably pwned even *before* execve.

A better “spawn” API should fix this.  In the mean time, I think it should be assumed that, if you join a container’s namespaces, you are at its mercy.

> The attacker replaces the executable in the container with a symlink
> to /proc/self/exe and replaces a library inside the container with a
> malicious one.


> When the container engine calls execve(), intending to run an executable
> inside the container, it instead goes through ptrace_may_access() using
> the introspection short-circuit and re-executes its own executable
> through the jumped symlink /proc/self/exe (which is normally unreachable
> for the container). After the execve(), the process loads an evil
> library from inside the container and is under the control of the
> container.
> Now the container controls a process whose /proc/self/exe is a jumped
> symlink to a host executable, and the container can write into it.
> Some container engines are now using an extremely ugly hack to work
> around this - whenever they want to enter a container, they copy the
> host binary into a new memfd and execute that to avoid exposing the
> original host binary to containers:
> <>
> In my opinion, the problems here are:
> - Apparently some people run untrusted containers without user
>   namespaces. It would be really nice if people could not do that.
>   (Probably the biggest problem here.)

> - ptrace_may_access() has a short-circuit that permits a process to
>   unintentionally look at itself even if it has dropped privileges -
>   here, it permits the execve("/proc/self/exe", ...) that would
>   normally be blocked by the check for CAP_SYS_PTRACE if the process
>   is nondumpable.

I don’t see this as a problem.  Dumpable is about protecting a task from others, not about protecting a task against itself.

> - You can use /proc/*/exe to get a writable fd.

This is IMO the real bug.

Powered by blists - more mailing lists