lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090924165548.GA16586@us.ibm.com>
Date:	Thu, 24 Sep 2009 09:55:48 -0700
From:	Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>
To:	linux-kernel@...r.kernel.org
Cc:	Oren Laadan <orenl@...columbia.edu>, serue@...ibm.com,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Alexey Dobriyan <adobriyan@...il.com>,
	Pavel Emelyanov <xemul@...nvz.org>,
	Andrew Morton <akpm@...l.org>, torvalds@...ux-foundation.org,
	mikew@...gle.com, mingo@...e.hu, hpa@...or.com,
	Nathan Lynch <nathanl@...tin.ibm.com>, arnd@...db.de,
	peterz@...radead.org,
	Containers <containers@...ts.linux-foundation.org>,
	sukadev@...ibm.com
Subject: [RFC][v7][PATCH 0/9] Implement clone2() system call


=== NEW CLONE() SYSTEM CALL:

To support application checkpoint/restart, a task must have the same pid it
had when it was checkpointed.  When containers are nested, the tasks within
the containers exist in multiple pid namespaces and hence have multiple pids
to specify during restart.

This patchset implements a new system call, clone2() that lets a process
specify the pids of the child process.

Patches 1 through 6 are helper patches, needed for choosing a pid for the
child process.

Patch 8 defines a prototype of the new system call. Patch 9 adds some
documentation on the new system call, some/all of which will eventually
go into a man page.

Changelog[v7]:
	- [Peter Zijlstra, Arnd Bergmann]
	  Rename clone_with_pids() to clone2(). Also group the arguments to
	  clone2() into a 'struct clone_struct' to workaround the issue of
	  exceeding 6 arguments to the system call. Also define clone-flags
	  as u64 to allow additional clone-flags.

Changelog[v6]:
	- [Nathan Lynch, Arnd Bergmann, H. Peter Anvin, Linus Torvalds]
	  Change 'pid_set.pids' to 'pid_t pids[]' so sizeof(struct pid_set) is
	  constant across architectures (Patches 7, 8).
	- (Nathan Lynch) Change pid_set.num_pids to unsigned and remove
	  'unum_pids < 0' check (Patches 7,8)
	- (Pavel Machek) New patch (Patch 9) to add some documentation.

Changelog[v5]:
	- Make 'pid_max' a property of pid_ns (Integrated Serge Hallyn's patch
	  into this set)
	- (Eric Biederman): Avoid the new function, set_pidmap() - added
	  couple of checks on 'target_pid' in alloc_pidmap() itself.

=== IMPORTANT TODO:

clone() system call has another limitation - all available bits in clone-flags
are in use and any new clone-flag will need a variant of the clone() system
call. 

It appears to make sense to try and extend this new system call to address
this limitation as well. The requirements of a new clone system call could
then be summarized as:

	- do everything clone() does today, and
	- give application an ability to choose pids for the child process
	  in all ancestor pid namespaces, and
	- allow more clone_flags

Contstraints:

	- system-calls are restricted to 6 parameters and clone() already
	  takes 5 parameters, any extension to clone() interface would require
	  one or more copy_from_user().  (Not sure if copy_from_user() of ~40
	  bytes would have a significant impact on performance of clone() on
	  any architecture).

Based on these requirements and constraints, we explored a couple of system
call interfaces (in earlier versions of this patchset) and currently define
the system call as:

	struct clone_struct {
		u64 flags;
		u64 child_stack;
		u32 nr_pids;
		u32 parent_tid;
		u32 child_tid;
		u32 reserved1;
		u64 reserved2;
	};

	sys_clone2(struct clone_struct __user *cs, pid_t __user *pids)

Signed-off-by: Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ