lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 12 May 2017 17:40:04 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Kirill Tkhai <ktkhai@...tuozzo.com>
Cc:     mhocko@...e.com, avagin@...nvz.org, peterz@...radead.org,
        linux-kernel@...r.kernel.org, rppt@...ux.vnet.ibm.com,
        ebiederm@...ssion.com, luto@...nel.org, gorcunov@...nvz.org,
        akpm@...ux-foundation.org, mingo@...nel.org, serge@...lyn.com
Subject: Re: [PATCH] pid_ns: Fix race between setns'ed fork() and
        zap_pid_ns_processes()

On 05/12, Kirill Tkhai wrote:
>
> Task from parent pid_ns             Child reaper
> copy_process()                      ..
>   alloc_pid()                       ..
>   ..                                zap_pid_ns_processes()
>   ..                                  disable_pid_allocation()
>   ..                                  read_lock(&tasklist_lock)
>   ..                                  iterate over pids in pid_ns
>   ..                                    kill tasks linked to pids
>   ..                                  read_unlock(&tasklist_lock)
>   write_lock_irq(&tasklist_lock);   ..
>   attach_pid(p, PIDTYPE_PID);       ..
>   ..                                ..
>
> So, just created task p won't receive SIGKILL signal,
> and the pid namespace will be in contradictory state.
> Only manual kill will help there, but does the userspace
> care about this? I suppose, the most users just inject
> a task into a pid namespace and wait a SIGCHLD from it.

OK.

> The patch fixes the problem. It moves disable_pid_allocation()
> into find_child_reaper() where tasklist_lock is held,

This looks unnecessary,

> and this allows to simply check for (pid_ns->nr_hashed & PIDNS_HASH_ADDING)
> in copy_process(). If allocation is disabled, we just
> return -ENOMEM like it's made for such cases in alloc_pid().

Yes, but note that zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace. Given that
copy_process() checks PIDNS_HASH_ADDING under write_lock(tasklist) they
can't race; if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss the change in ->nr_hashed.

So I think you can safely remove the changes in exit.c and pid_namespace.c.

> @@ -1523,6 +1523,7 @@ static __latent_entropy struct task_struct *copy_process(
>  					unsigned long tls,
>  					int node)
>  {
> +	struct pid_namespace *pid_ns;
>  	int retval;
>  	struct task_struct *p;
>  
> @@ -1735,8 +1736,9 @@ static __latent_entropy struct task_struct *copy_process(
>  	if (retval)
>  		goto bad_fork_cleanup_io;
>  
> +	pid_ns = p->nsproxy->pid_ns_for_children;
>  	if (pid != &init_struct_pid) {
> -		pid = alloc_pid(p->nsproxy->pid_ns_for_children);
> +		pid = alloc_pid(pid_ns);
>  		if (IS_ERR(pid)) {
>  			retval = PTR_ERR(pid);
>  			goto bad_fork_cleanup_thread;
> @@ -1845,10 +1847,11 @@ static __latent_entropy struct task_struct *copy_process(
>  	*/
>  	recalc_sigpending();
>  	if (signal_pending(current)) {
> -		spin_unlock(&current->sighand->siglock);
> -		write_unlock_irq(&tasklist_lock);
>  		retval = -ERESTARTNOINTR;
> -		goto bad_fork_cancel_cgroup;
> +		goto bad_fork_unlock_siglock;
> +	} else if (unlikely(!(pid_ns->nr_hashed & PIDNS_HASH_ADDING))) {
> +		retval = -ENOMEM;
> +		goto bad_fork_unlock_siglock;

I won't insist, feel free to ignore... But I don't really like the fact
you add the new pid_ns var, copy_process() is already huge and complex.
Can't you simply use ns_of_pid(pid_ns)->nr_hashed ? Yes, this will add
a couple of additional insns, but imo readability is more important.

And why "else if"? Imho this looks less readable and a bit confusing
compared to 2 subsequent if()'s.

And probably a helper which checks PIDNS_HASH_ADDING makes some sense,
but we can do this separately.

> -bad_fork_cancel_cgroup:
> +bad_fork_unlock_siglock:
> +	spin_unlock(&current->sighand->siglock);
> +	write_unlock_irq(&tasklist_lock);
>  	cgroup_cancel_fork(p);

OK, agreed. Except the new name doesn't match the code ;)

Oleg.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ