lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8de7d233-b7d3-a63e-4980-eb32e8761c30@virtuozzo.com>
Date:   Mon, 29 May 2017 13:49:40 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
CC:     <mhocko@...e.com>, <avagin@...nvz.org>, <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>, <oleg@...hat.com>,
        <rppt@...ux.vnet.ibm.com>, <luto@...nel.org>,
        <gorcunov@...nvz.org>, <akpm@...ux-foundation.org>,
        <mingo@...nel.org>, <serge@...lyn.com>
Subject: Re: [PATCH] pid_ns: Allow to get pid_for_children ns before
 child_reaper is created

On 27.05.2017 14:01, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
> 
>> This patch prohibits pid allocation till child_reaper
>> of pid namespace is set, and it makes possible and safe
>> to get just unshared pid_ns from "/proc/[pid]/ns/pid_for_children"
>> file. This may be useful to determine user_ns of such a created
>> pid_ns, which is not possible now.
>>
>> It was prohibited till now, because the architecture of pid namespaces
>> assumes child reaper is the firstly created process of the namespace,
>> and it initializes pid_namespace::proc_mnt. Child reaper creation
>> mustn't race with creation of another processes from this namespace,
>> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
>> is populated and it will get a null pointer dereference in proc_flush_task().
>> Also, child reaper mustn't die before processes from the namespace.
> 
> This patch introduces the possibility that two or more processes may
> have the same pid namespace (with no processes) as pid_ns_for_children.
> 
> Which means you can now have a race for the first pid in alloc_pid.
> Making it indeterminant who allocates the init process.  Which is not
> acceptable.
> 
> It is not acceptable on two grounds.
> 1) It is a bogus user space semantic.  Because userspace needs to
>    know who allocates init.
> 2) It is horrible for maintenance becuase now the code has to be very
>    clever to deal with a case that no one cares about.  Which is
>    a general formula for buggy code.

We may disallow setns() if there is no child reaper created, and
this solves all above issues. Please see v2 below, it has no problems
you pointed.

[PATCH v2]pid_ns: Allow to get pid_for_children ns before child_reaper is created

This patch prohibits setns() on a pid namespace till its child_reaper
is set, and it makes possible and safe to get just unshared pid_ns
from "/proc/[pid]/ns/pid_for_children" file. This may be useful
to determine user_ns of such a created pid_ns, which is not possible now.

It was not possible till now, because the architecture of pid namespaces
assumes child reaper is the first created process of the namespace,
and it initializes pid_namespace::proc_mnt. Child reaper creation
mustn't race with creation of another processes from this namespace,
otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
is populated and it will get a null pointer dereference in proc_flush_task().
Also, child reaper mustn't die before processes from the namespace.

The patch prevents such races. It allows to setns() on a pid namespace
only if ns->child_reaper is already set, and this guarantees, that
only pid namespace creator may establish child reaper.
So, we can safely allow to get "/proc/[pid]/ns/pid_for_children"
since it's created, and to analyse it.

v2: Don't race for child reaper creation.

Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
CC: Andrew Morton <akpm@...ux-foundation.org>
CC: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Oleg Nesterov <oleg@...hat.com>
CC: Andy Lutomirski <luto@...nel.org>
CC: Serge Hallyn <serge@...lyn.com>
CC: Michal Hocko <mhocko@...e.com>
CC: Andrei Vagin <avagin@...nvz.org>
CC: Cyrill Gorcunov <gorcunov@...nvz.org>
CC: Mike Rapoport <rppt@...ux.vnet.ibm.com>
CC: Ingo Molnar <mingo@...nel.org>
CC: Peter Zijlstra <peterz@...radead.org>
---
 kernel/pid_namespace.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 74a5a7255b4d..5e7b3fd0d4c2 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -385,15 +385,6 @@ static struct ns_common *pidns_for_children_get(struct task_struct *task)
 	}
 	task_unlock(task);
 
-	if (ns) {
-		read_lock(&tasklist_lock);
-		if (!ns->child_reaper) {
-			put_pid_ns(ns);
-			ns = NULL;
-		}
-		read_unlock(&tasklist_lock);
-	}
-
 	return ns ? &ns->ns : NULL;
 }
 
@@ -428,6 +419,15 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	if (ancestor != active)
 		return -EINVAL;
 
+	/*
+	 * Disallow processes to use pid namespace till its
+	 * creator makes child reaper. Otherwise, several
+	 * processes race for that, and it's not clear who
+	 * establishes init.
+	 */
+	if (!new->child_reaper)
+		return -ESRCH;
+
 	put_pid_ns(nsproxy->pid_ns_for_children);
 	nsproxy->pid_ns_for_children = get_pid_ns(new);
 	return 0;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ