lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f7c774ef-7b93-c2b9-f257-f538cce09730@virtuozzo.com>
Date:   Thu, 8 Jun 2017 15:17:25 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     mhocko@...e.com, avagin@...nvz.org, peterz@...radead.org,
        linux-kernel@...r.kernel.org, oleg@...hat.com,
        rppt@...ux.vnet.ibm.com, luto@...nel.org, gorcunov@...nvz.org,
        akpm@...ux-foundation.org, mingo@...nel.org, serge@...lyn.com
Subject: Re: [PATCH] pid_ns: Allow to get pid_for_children ns before
 child_reaper is created

ping

On 29.05.2017 13:49, Kirill Tkhai wrote:
> On 27.05.2017 14:01, Eric W. Biederman wrote:
>> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
>>
>>> This patch prohibits pid allocation till child_reaper
>>> of pid namespace is set, and it makes possible and safe
>>> to get just unshared pid_ns from "/proc/[pid]/ns/pid_for_children"
>>> file. This may be useful to determine user_ns of such a created
>>> pid_ns, which is not possible now.
>>>
>>> It was prohibited till now, because the architecture of pid namespaces
>>> assumes child reaper is the firstly created process of the namespace,
>>> and it initializes pid_namespace::proc_mnt. Child reaper creation
>>> mustn't race with creation of another processes from this namespace,
>>> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
>>> is populated and it will get a null pointer dereference in proc_flush_task().
>>> Also, child reaper mustn't die before processes from the namespace.
>>
>> This patch introduces the possibility that two or more processes may
>> have the same pid namespace (with no processes) as pid_ns_for_children.
>>
>> Which means you can now have a race for the first pid in alloc_pid.
>> Making it indeterminant who allocates the init process.  Which is not
>> acceptable.
>>
>> It is not acceptable on two grounds.
>> 1) It is a bogus user space semantic.  Because userspace needs to
>>    know who allocates init.
>> 2) It is horrible for maintenance becuase now the code has to be very
>>    clever to deal with a case that no one cares about.  Which is
>>    a general formula for buggy code.
> 
> We may disallow setns() if there is no child reaper created, and
> this solves all above issues. Please see v2 below, it has no problems
> you pointed.
> 
> [PATCH v2]pid_ns: Allow to get pid_for_children ns before child_reaper is created
> 
> This patch prohibits setns() on a pid namespace till its child_reaper
> is set, and it makes possible and safe to get just unshared pid_ns
> from "/proc/[pid]/ns/pid_for_children" file. This may be useful
> to determine user_ns of such a created pid_ns, which is not possible now.
> 
> It was not possible till now, because the architecture of pid namespaces
> assumes child reaper is the first created process of the namespace,
> and it initializes pid_namespace::proc_mnt. Child reaper creation
> mustn't race with creation of another processes from this namespace,
> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
> is populated and it will get a null pointer dereference in proc_flush_task().
> Also, child reaper mustn't die before processes from the namespace.
> 
> The patch prevents such races. It allows to setns() on a pid namespace
> only if ns->child_reaper is already set, and this guarantees, that
> only pid namespace creator may establish child reaper.
> So, we can safely allow to get "/proc/[pid]/ns/pid_for_children"
> since it's created, and to analyse it.
> 
> v2: Don't race for child reaper creation.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
> CC: Andrew Morton <akpm@...ux-foundation.org>
> CC: "Eric W. Biederman" <ebiederm@...ssion.com>
> CC: Oleg Nesterov <oleg@...hat.com>
> CC: Andy Lutomirski <luto@...nel.org>
> CC: Serge Hallyn <serge@...lyn.com>
> CC: Michal Hocko <mhocko@...e.com>
> CC: Andrei Vagin <avagin@...nvz.org>
> CC: Cyrill Gorcunov <gorcunov@...nvz.org>
> CC: Mike Rapoport <rppt@...ux.vnet.ibm.com>
> CC: Ingo Molnar <mingo@...nel.org>
> CC: Peter Zijlstra <peterz@...radead.org>
> ---
>  kernel/pid_namespace.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 74a5a7255b4d..5e7b3fd0d4c2 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -385,15 +385,6 @@ static struct ns_common *pidns_for_children_get(struct task_struct *task)
>  	}
>  	task_unlock(task);
>  
> -	if (ns) {
> -		read_lock(&tasklist_lock);
> -		if (!ns->child_reaper) {
> -			put_pid_ns(ns);
> -			ns = NULL;
> -		}
> -		read_unlock(&tasklist_lock);
> -	}
> -
>  	return ns ? &ns->ns : NULL;
>  }
>  
> @@ -428,6 +419,15 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	if (ancestor != active)
>  		return -EINVAL;
>  
> +	/*
> +	 * Disallow processes to use pid namespace till its
> +	 * creator makes child reaper. Otherwise, several
> +	 * processes race for that, and it's not clear who
> +	 * establishes init.
> +	 */
> +	if (!new->child_reaper)
> +		return -ESRCH;
> +
>  	put_pid_ns(nsproxy->pid_ns_for_children);
>  	nsproxy->pid_ns_for_children = get_pid_ns(new);
>  	return 0;
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ