lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 13 Jun 2012 15:17:34 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Louis Rilling <louis.rilling@...labs.com>,
	Mike Galbraith <efault@....de>,
	Pavel Emelyanov <xemul@...allels.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] pidns: guarantee that the pidns init will be the last pidns process reaped

Andrew Morton <akpm@...ux-foundation.org> writes:

>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -64,7 +64,6 @@ static void exit_mm(struct task_struct * tsk);
>>  static void __unhash_process(struct task_struct *p, bool group_dead)
>>  {
>>  	nr_threads--;
>> -	detach_pid(p, PIDTYPE_PID);
>>  	if (group_dead) {
>>  		detach_pid(p, PIDTYPE_PGID);
>>  		detach_pid(p, PIDTYPE_SID);
>> @@ -72,7 +71,20 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
>>  		list_del_rcu(&p->tasks);
>>  		list_del_init(&p->sibling);
>>  		__this_cpu_dec(process_counts);
>> +		/*
>> +		 * If we are the last child process in a pid namespace to be
>> +		 * reaped, notify the reaper sleeping zap_pid_ns_processes().
>> +		 */
>> +		if (IS_ENABLED(CONFIG_PID_NS)) {
>> +			struct task_struct *parent = p->real_parent;
>> +
>> +			if ((task_active_pid_ns(p)->child_reaper == parent) &&
>> +			    list_empty(&parent->children) &&
>> +			    (parent->flags & PF_EXITING))
>> +				wake_up_process(parent);
>> +		}
>>  	}
>> +	detach_pid(p, PIDTYPE_PID);
>>  	list_del_rcu(&p->thread_group);
>>  }
>>  
>> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
>> index 57bc1fd..41ed867 100644
>> --- a/kernel/pid_namespace.c
>> +++ b/kernel/pid_namespace.c
>> @@ -179,11 +179,31 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> +	/* Firstly reap the EXIT_ZOMBIE children we may have. */
>>  	do {
>>  		clear_thread_flag(TIF_SIGPENDING);
>>  		rc = sys_wait4(-1, NULL, __WALL, NULL);
>>  	} while (rc != -ECHILD);
>>  
>> +	/*
>> +	 * sys_wait4() above can't reap the TASK_DEAD children.
>> +	 * Make sure they all go away, see __unhash_process().
>> +	 */
>> +	for (;;) {
>> +		bool need_wait = false;
>> +
>> +		read_lock(&tasklist_lock);
>> +		if (!list_empty(&current->children)) {
>> +			__set_current_state(TASK_UNINTERRUPTIBLE);
>> +			need_wait = true;
>> +		}
>> +		read_unlock(&tasklist_lock);
>> +
>> +		if (!need_wait)
>> +			break;
>> +		schedule();
>
> This sleep is terminated by the above wake_up_process(), yes?

Yes.

> But that wake_up_process() only happens if CONFIG_PID_NS.  It's
> unobvious (to me) that we can't get stuck in D state if
> CONFIG_PID_NS=n.  If bug, please fix.  If not bug, please make obvious
> to me ;)

The file pid_namespace.c that holds zap_pid_ns_processes is only
compiled with CONFIG_PID_NS set.  So we can't possibly get stuck with
CONFIG_PID_NS=n.

> <looks at the locking a bit>
>
> <gets distracted>
>
> That tty_kref_put() in __exit_signal() is running with tasklist_lock
> held, yes?  It does a ton of work and calls out to random drivers and
> none of this needs tasklist_lock.  Seems risky.

Interesting.  That tty_kref_put does sound like an area where the
locking can be simplified.  At the same time tty_kref_put does make
sense from exit signal.  As ttys and signals are intimately intertwined.

Thank you for taking the time looking at this and applying this change.

I have to agree that the tasklist_lock is pretty horrible, and that if
we can somehow figure out how to remove it we would be in a much better
situation with lock contention.  Unfortunately that is an alligator and
I am working to drain the swamp.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ