linux-kernel - Re: [PATCH 07/11] pidns: Wait in zap_pid_ns_processes until pid_ns->nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87licrl0gg.fsf@xmission.com>
Date:	Fri, 21 Dec 2012 10:33:19 -0800
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Linux Containers <containers@...ts.linux-foundation.org>,
	linux-kernel@...r.kernel.org, Serge Hallyn <serge@...lyn.com>,
	Gao feng <gaofeng@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 07/11] pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1

Oleg Nesterov <oleg@...hat.com> writes:

> On 12/20, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@...hat.com> writes:
>>
>> > On 11/16, Eric W. Biederman wrote:
>> >>
>> >> @@ -216,22 +216,15 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>> >>
>> >>  	/*
>> >>  	 * sys_wait4() above can't reap the TASK_DEAD children.
>> >> -	 * Make sure they all go away, see __unhash_process().
>> >> +	 * Make sure they all go away, see free_pid().
>> >>  	 */
>> >>  	for (;;) {
>> >> -		bool need_wait = false;
>> >> -
>> >> -		read_lock(&tasklist_lock);
>> >> -		if (!list_empty(&current->children)) {
>> >> -			__set_current_state(TASK_UNINTERRUPTIBLE);
>> >> -			need_wait = true;
>> >> -		}
>> >> -		read_unlock(&tasklist_lock);
>> >> -
>> >> -		if (!need_wait)
>> >> +		set_current_state(TASK_UNINTERRUPTIBLE);
>> >> +		if (pid_ns->nr_hashed == 1)
>> >>  			break;
>> >>  		schedule();
>> >>  	}
>> >
>> > I agree, the patch itself looks fine.
>> >
>> > But, with all other changes I do not understand this part at all.
>> >
>> > A task from the parent namespace can do setns + fork at any time
>> > (until nr_hashed >= 0). So ->nr_hashed can be incremented again
>> > after zap_pid_ns_processes() returns.
>
> XXX: this creates the new pid P in this ns. Please see below...
>
>> I want to talk about how alloc_pid and free_pid prevent nr_hashed
>> from increasing once the last processes has exited the pid namespace
>> but that doesn't apply here.
>
> Not sure I understand, but it seems you agree this can happen.
>
>> > Or, we can sleep in TASK_UNINTERRUPTIBLE "forever" if this happens
>> > after kill-them-all.
>>
>> Sleeping forever should be prevented by this chunk in free_pid:
>
> Note that I said "forever", not forever ;)
>
>>
>> 		switch(--ns->nr_hashed) {
>> 		case 1:
>> 			/* When all that is left in the pid namespace
>> 			 * is the reaper wake up the reaper.  The reaper
>> 			 * may be sleeping in zap_pid_ns_processes().
>> 			 */
>> 			wake_up_process(ns->child_reaper);
>>
>>
>> I admit it continues to be true that if an injected process or a
>> debugged process does not exit we can block waiting for all of the
>> processes to be reaped indefinitely.
>
> Yes, I meant until the injected process exits.


>> > Could you explain why do we need to wait at all? I can be easily
>> > wrong, but at first glance the original reason for this wait has
>> > gone away?
>>
>> It is very nice to know that when you do waitpid for the init process of
>> a pid namespace that there are no other processes in the pid namespace.
>
> OK, and I agree. But my point was, at least this _looks_ strange, because
> ns->nr_hashed == 1 is not stable.
>
> And in fact I think this is not strange, but simply wrong.
>
> Please consider the XXX case above. Suppose that free_pid(P) happens
> after ns->child_reaper exits and thus this pointer points to nowhere.
> Suppose also that there is another injected pid so nr_hashed == 2.
> In this case wake_up_process(ns->child_reaper) means use-after-free,
> no?

That is a bug in two dimensions.
- The use after free.
- The fact that a process can persist in the pid namespace after the
  init process has been exited and waited on.

Without the setns support this code is fine, but with setns we have
an process injection bug.

I suspect zap_pid_ns_processes just needs to set a flag that prevents
the allocation of new pids.  I am going to have to step back and think
of something clean.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/