linux-kernel - Re: [PATCH] pidns: Guarantee that the pidns init will be the last pidns process reaped. v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120523145239.GA20378@redhat.com>
Date:	Wed, 23 May 2012 16:52:39 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Pavel Emelyanov <xemul@...allels.com>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Louis Rilling <louis.rilling@...labs.com>,
	Mike Galbraith <efault@....de>
Subject: Re: [PATCH] pidns: Guarantee that the pidns init will be the last
	pidns process reaped. v2

On 05/22, Andrew Morton wrote:
>
> On Mon, 21 May 2012 18:20:31 -0600
> ebiederm@...ssion.com (Eric W. Biederman) wrote:
>
> > +		/* If we are the last child process in a pid namespace
>
> like this:
> 		/*
> 		 * If ...
>
> > +		 * to be reaped notify the child_reaper.
>
> s/reaped/reaped,/
>
> More seriously, it isn't a very good comment.  It tells us "what" the
> code is doing (which is pretty obvious from reading it), but it didn't
> tell us "why" it is doing this.  Why do PID namespaces need special
> handling here?  What's the backstory??

Well, this is documented in the changelog but I agree, this needs some
documentation.

Perhaps zap_pid_ns_processes() should document that it waits for the
stealth EXIT_DEAD tasks, and __unhash_process() can simply say
"see zap_pid_ns_processes()".

> > @@ -189,6 +189,17 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> >  		rc = sys_wait4(-1, NULL, __WALL, NULL);
> >  	} while (rc != -ECHILD);
> >
> > +	read_lock(&tasklist_lock);
> > +	for (;;) {
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		if (list_empty(&current->children))
> > +			break;
> > +		read_unlock(&tasklist_lock);
> > +		schedule();
> > +		read_lock(&tasklist_lock);
> > +	}
> > +	read_unlock(&tasklist_lock);
>
> Well.
>
> a) This loop can leave the thread in state TASK_UNINTERRUPTIBLE,
>    which looks wrong.

OOPS. You fixed this in *-fix.patch, thanks.

> b) Given that the waking side is also testing list_empty(), I think
>    you might need set_current_state() here, with the barrier.  So that
>    this thread gets the correct view of the list_head wrt the waker's
>    view.

We rely on tasklist_lock, note that the waking side takes it for writing.

> c) Did we really need to bang on tasklist_lock so many times?  There
>    doesn't seem a nicer way of structuring this :(

See above.

We do not really need tasklist_lock to wait for list_empty(children),
we could add a couple of barriers or use wait_event(wait_chldexit).
But the explicit usage of tasklist makes the code more understandable,
this this way we obviously can't race with the child doing release_task(),
say, we can't return before the last detach_pid(PIDTYPE_PID).

As for re-structuring, I'd suggest

	for (;;) {
		bool need_wait = false;

		read_lock(&tasklist_lock);
		if (!list_empty(&current->children)) {
			__set_current_state(TASK_UNINTERRUPTIBLE);
			need_wait = true;
		}
		read_unlock(&tasklist_lock);

		if (!need_wait)
			break;
		schedule();
	}

but this is subjective.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/