[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120523145239.GA20378@redhat.com>
Date: Wed, 23 May 2012 16:52:39 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
LKML <linux-kernel@...r.kernel.org>,
Pavel Emelyanov <xemul@...allels.com>,
Cyrill Gorcunov <gorcunov@...nvz.org>,
Louis Rilling <louis.rilling@...labs.com>,
Mike Galbraith <efault@....de>
Subject: Re: [PATCH] pidns: Guarantee that the pidns init will be the last
pidns process reaped. v2
On 05/22, Andrew Morton wrote:
>
> On Mon, 21 May 2012 18:20:31 -0600
> ebiederm@...ssion.com (Eric W. Biederman) wrote:
>
> > + /* If we are the last child process in a pid namespace
>
> like this:
> /*
> * If ...
>
> > + * to be reaped notify the child_reaper.
>
> s/reaped/reaped,/
>
> More seriously, it isn't a very good comment. It tells us "what" the
> code is doing (which is pretty obvious from reading it), but it didn't
> tell us "why" it is doing this. Why do PID namespaces need special
> handling here? What's the backstory??
Well, this is documented in the changelog but I agree, this needs some
documentation.
Perhaps zap_pid_ns_processes() should document that it waits for the
stealth EXIT_DEAD tasks, and __unhash_process() can simply say
"see zap_pid_ns_processes()".
> > @@ -189,6 +189,17 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> > rc = sys_wait4(-1, NULL, __WALL, NULL);
> > } while (rc != -ECHILD);
> >
> > + read_lock(&tasklist_lock);
> > + for (;;) {
> > + __set_current_state(TASK_UNINTERRUPTIBLE);
> > + if (list_empty(¤t->children))
> > + break;
> > + read_unlock(&tasklist_lock);
> > + schedule();
> > + read_lock(&tasklist_lock);
> > + }
> > + read_unlock(&tasklist_lock);
>
> Well.
>
> a) This loop can leave the thread in state TASK_UNINTERRUPTIBLE,
> which looks wrong.
OOPS. You fixed this in *-fix.patch, thanks.
> b) Given that the waking side is also testing list_empty(), I think
> you might need set_current_state() here, with the barrier. So that
> this thread gets the correct view of the list_head wrt the waker's
> view.
We rely on tasklist_lock, note that the waking side takes it for writing.
> c) Did we really need to bang on tasklist_lock so many times? There
> doesn't seem a nicer way of structuring this :(
See above.
We do not really need tasklist_lock to wait for list_empty(children),
we could add a couple of barriers or use wait_event(wait_chldexit).
But the explicit usage of tasklist makes the code more understandable,
this this way we obviously can't race with the child doing release_task(),
say, we can't return before the last detach_pid(PIDTYPE_PID).
As for re-structuring, I'd suggest
for (;;) {
bool need_wait = false;
read_lock(&tasklist_lock);
if (!list_empty(¤t->children)) {
__set_current_state(TASK_UNINTERRUPTIBLE);
need_wait = true;
}
read_unlock(&tasklist_lock);
if (!need_wait)
break;
schedule();
}
but this is subjective.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists