lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AF5ECFD.3000509@librato.com>
Date:	Sat, 07 Nov 2009 16:56:13 -0500
From:	Oren Laadan <orenl@...rato.com>
To:	Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>
CC:	Matt Helsley <matthltc@...ibm.com>, arnd@...db.de,
	Containers <containers@...ts.linux-foundation.org>,
	linux-kernel@...r.kernel.org,
	"Eric W. Biederman" <ebiederm@...ssion.com>, hpa@...or.com,
	Alexey Dobriyan <adobriyan@...il.com>, roland@...hat.com,
	Pavel Emelyanov <xemul@...nvz.org>
Subject: Re: [v11][PATCH 9/9] Document clone_with_pids() syscall



Sukadev Bhattiprolu wrote:
> Matt Helsley [matthltc@...ibm.com] wrote:
> | > If userspace passes an array with n pids and there are k namespace levels
> | > then clone_with_pids() makes sure that the kernel sees a pid array like:
> | > 
> | > index	  0     ... k - (n + 1)        ...          k - 1
> | > 	+-----------------------+-------------------------+
> | > pid_t	| 0 ..................0 | <copied from userspace> |
> | > 	+-----------------------+-------------------------+
> | 
> | (diagram assumes n != k. If n == k then pids[0] is the pid desired
> | in the initial namespace..)
> 
> True.
> 
> Also I was not sure if we should prevent choosing pids in ancestor containers.
> since a process is not even supposed to know of ancestor namespaces. Is there
> a need for choosing pids in those namespaces.

IMHO this is a bit confusing.

A process observes a single namespace - the one in which it "lives".
There is no such thing as descendant namespaces for that process.
There may be ancestor namespaces.

The clone occurs in the context of the process. So the process that
is forking _must_ indicate pids in _ancestor_ namespaces if it wishes
to select pids in those (as is the case in c/r).

> 
> | 
> | > 
> | > So even though the order is different from choosepid() the calling
> | > task still doesn't need to know its pidns level. Of course, just
> | > like choosepid(), n <= k or userspace will get EINVAL.
> | 
> | Forgot to mention that I prefer the way choosepid orders the pids.
> | It's not inspired by the way that the kernel implements pid namespaces
> | and has more to do with the way userspace sees things (IMHO).
> 
> Hmm, In general we C/R a descendant container. So the way userspace
> sees it at that point is "what are the pids of this process in my current
> and in any descendant namespaces". IOW, the pid of container from which
> we checkpoint seems more interesting first - right ?  If so, the pids[]
> are better ordered from older namespace to younger namespace ?

When we checkpoint, we use an external process to record the state of
(current or) descendant namespaces.

When we restart, we run in the context of the restarting process, so
we select a pid in the current and _ancestor_ namespaces.

So the order of pids as it (will) appear in the checkpoint image for
a given process will be from an ancestor down to descendant namespaces.
And this is how we (will) hand it over to eclone().

> 
> | I don't know if it makes more sense to change clone_with_pids() or have
> | [e]glibc wrappers swap the array contents.

I prefer to decide now on an order and stick to it in the kernel and
in glibc.

Oren

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ