[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170124140738.GA21034@redhat.com>
Date: Tue, 24 Jan 2017 15:07:38 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
Lennart Poettering <lennart@...ttering.net>,
Kay Sievers <kay.sievers@...y.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Cyrill Gorcunov <gorcunov@...nvz.org>,
John Stultz <john.stultz@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>,
Nicolas Pitre <nicolas.pitre@...aro.org>,
Michal Hocko <mhocko@...e.com>,
Stanislav Kinsburskiy <skinsbursky@...tuozzo.com>,
Mateusz Guzik <mguzik@...hat.com>,
linux-kernel@...r.kernel.org,
Pavel Emelyanov <xemul@...tuozzo.com>,
Konstantin Khorenko <khorenko@...tuozzo.com>
Subject: Re: setns() && PR_SET_CHILD_SUBREAPER
On 01/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@...hat.com> writes:
>
> > Suppose we have a process P in the root namespace and another namespace X.
> >
> > P does setns() and enters the X namespace.
> > P forks a child C.
> >
> > C forks a grandchild G.
> > C exits.
> >
> > The question is, where should we reparent the grandchild G? In the normal
> > case it will be reparented to X->child_reaper and this looks correct.
> >
> > But lets suppose that P runs with the ->has_child_subreaper bit set. In
> > this case it will be reparented to P's sub-reaper or a global init, and
> > given that P can't control its ->has_child_subreaper flag this does not
> > look right to me.
> >
> > I can make a simple patch but perhaps I missed something or we actually
> > want this (imo strange) behaviour?
>
> We definitely do not want a child to be repareted out of a pid namespace
> when the pid namespace has a perfectly fine child_reaper.
>
> The special case for the init_task in find_new_reaper appears to be the
> instance of this problem that was considered in the code.
Actually we should blame the same_thread_group(reaper, child_reaper) check,
it should had ensured we could not cross the namespaces, but it is not
enough. Because this logic predates setns().
> Semantically what we want to do is walk up the parents in the process
> tree. If a parent has is_child_subreaper we stop at it. If the
> transition from one parent to the next we are switching pid namespaces
> we want the reaper from the pid namespace.
Yes, this is what I have in mind, see the patch below. I need to re-check
it and update the comment to explain why we can't simply check child_reaper
as we currently do.
This way we can start the search from father->real_parent, but the comment
above the "reaper == &init_task" is no longer correct, we always need this
check although perhaps is_idle_task(reaper) would be better.
> As I recall has_child_subreaper was just supposed to be an optimization
> so the common case would not have to walk up the process tree when
> finding it's parent.
Yep.
> If we retain any optimizations such as has_child_subreaper please
> consider the case where a process with is_child_subreaper set exits,
> and what happens to it's children.
Yes, in this case it should not have any effect. Well, there is another
corner case, perhaps we should turn
if (!reaper->signal->is_child_subreaper)
continue;
into
if (!reaper->signal->is_child_subreaper) {
if (!reaper->signal->has_child_subreaper)
break;
continue;
}
this looks a bit more correct if the exited "is_child_subreaper" process
was forked, and after that its parent called prctl(SET_CHILD_SUBREAPER).
But I think we do not care and Pavel is going to eliminate the case when
a child of is_child_subreaper task can run without has_child_subreaper
flag set.
So what do you think about the patch below?
Oleg.
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -569,15 +569,15 @@ static struct task_struct *find_new_reaper(struct task_struct *father,
return thread;
if (father->signal->has_child_subreaper) {
+ unsigned int level = task_pid(father)->level;
/*
* Find the first ->is_child_subreaper ancestor in our pid_ns.
- * We start from father to ensure we can not look into another
- * namespace, this is safe because all its threads are dead.
+ * We check pid->level, this is slightly more efficient than
+ * task_active_pid_ns(reaper) != task_active_pid_ns(father).
*/
- for (reaper = father;
- !same_thread_group(reaper, child_reaper);
+ for (reaper = father->real_parent;
+ task_pid(reaper)->level == level;
reaper = reaper->real_parent) {
- /* call_usermodehelper() descendants need this check */
if (reaper == &init_task)
break;
if (!reaper->signal->is_child_subreaper)
Powered by blists - more mailing lists