linux-kernel - Re: setns() && PR_SET_CHILD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87tw8p8wo8.fsf@xmission.com>
Date:   Tue, 24 Jan 2017 07:21:11 +1300
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
        Lennart Poettering <lennart@...ttering.net>,
        Kay Sievers <kay.sievers@...y.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Cyrill Gorcunov <gorcunov@...nvz.org>,
        John Stultz <john.stultz@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Nicolas Pitre <nicolas.pitre@...aro.org>,
        Michal Hocko <mhocko@...e.com>,
        Stanislav Kinsburskiy <skinsbursky@...tuozzo.com>,
        Mateusz Guzik <mguzik@...hat.com>,
        linux-kernel@...r.kernel.org,
        Pavel Emelyanov <xemul@...tuozzo.com>,
        Konstantin Khorenko <khorenko@...tuozzo.com>
Subject: Re: setns() && PR_SET_CHILD_SUBREAPER

Oleg Nesterov <oleg@...hat.com> writes:

> And this discussion reminds me again that I do not understand how setns()
> and PR_SET_CHILD_SUBREAPER should play together... Add cc's.

I agree that they are currently playing together incorrectly.

> Suppose we have a process P in the root namespace and another namespace X.
>
> P does setns() and enters the X namespace.
> P forks a child C.
>
> C forks a grandchild G.
> C exits.
>
> The question is, where should we reparent the grandchild G? In the normal
> case it will be reparented to X->child_reaper and this looks correct.
>
> But lets suppose that P runs with the ->has_child_subreaper bit set. In
> this case it will be reparented to P's sub-reaper or a global init, and
> given that P can't control its ->has_child_subreaper flag this does not
> look right to me.
>
> I can make a simple patch but perhaps I missed something or we actually
> want this (imo strange) behaviour?

We definitely do not want a child to be repareted out of a pid namespace
when the pid namespace has a perfectly fine child_reaper.

The special case for the init_task in find_new_reaper appears to be the
instance of this problem that was considered in the code.

Given the semantics described and asked for of PR_SET_CHILD_SUBREAPER I
believe has_child_subreaper needs to be strictly considered an
implementation detail and any way that userspace can observe it a bug in
the code.

Semantically what we want to do is walk up the parents in the process
tree.  If a parent has is_child_subreaper we stop at it.  If the
transition from one parent to the next we are switching pid namespaces
we want the reaper from the pid namespace.

As I recall has_child_subreaper was just supposed to be an optimization
so the common case would not have to walk up the process tree when
finding it's parent.

If we retain any optimizations such as has_child_subreaper please
consider the case where a process with is_child_subreaper set exits,
and what happens to it's children.

Eric