linux-kernel - Re: [ckrm-tech] [PATCH 00/10] Containers(V10): Generic Process Containers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070608194247.GA12671@vino.hallyn.com>
Date:	Fri, 8 Jun 2007 14:42:47 -0500
From:	"Serge E. Hallyn" <serge@...lyn.com>
To:	Paul Menage <menage@...gle.com>
Cc:	"Serge E. Hallyn" <serue@...ibm.com>,
	"Serge E. Hallyn" <serge@...lyn.com>, Paul Jackson <pj@....com>,
	vatsa@...ibm.com, ckrm-tech@...ts.sourceforge.net,
	balbir@...ibm.com, rohitseth@...gle.com, haveblue@...ibm.com,
	xemul@...ru, dev@...ru, containers@...ts.osdl.org,
	devel@...nvz.org, ebiederm@...ssion.com, mbligh@...gle.com,
	cpw@....com, svaidy@...ux.vnet.ibm.com, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [ckrm-tech] [PATCH 00/10] Containers(V10): Generic Process Containers

Quoting Paul Menage (menage@...gle.com):
> On 6/8/07, Serge E. Hallyn <serue@...ibm.com> wrote:
> >
> >I do fear that that could become a maintenance nightmare.  For instance
> >right now there's the call to fsnotify_mkdir().  Other such hooks might
> >be placed at vfs_mkdir, which we'd then likely want to have placed in
> >our container_mkdir() and container_clone() fns.  And of course
> >may_create() is static inline in fs/namei.c.  It's trivial, but still if
> >it changes we'd want to change the version in kernel/container.c as
> >well.
> 
> Do we need to actually need to respect may_create() in
> container_clone()? I guess it would provide a way for root to control
> which processes could unshare namespaces.
> 
> >
> >What would be the main advantage of doing it this way?  Do you consider
> >the extra subys->auto_setup() hook to be avoidable bloat?
> >
> 
> I was thinking that it would be nice to be able to atomically set up
> the resources in the new container at the point when it's created
> rather than later. But I guess this way can work too. Can we call it
> something like "clone()" rather than "auto_setup()"?
> 
> Paul

clone() implies it does the actual cloning, so how about post_clone()
as in the patch below?

I'm still not saying I'm entirely opposed to moving the vfs_mkdir logic
straight into container_clone() - it's more that I would expect other
people to object when they saw that.  So if you decide you don't like
the end result with this patch, let me know and I'll give that a shot.

Paul (Jackson), is this comment added in cpusets close enough to what
you were asking for?

thanks,
-serge

>From c2f1a39b231f06cb524c6e95d74de6ddee286f25 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serue@...ibm.com>
Date: Fri, 8 Jun 2007 15:36:59 -0400
Subject: [PATCH 4/4] containers: minor clone cleanup

rename auto_setup() to post_clone(), and comment the cpusets version.

Signed-off-by: Serge E. Hallyn <serue@...ibm.com>
---
 Documentation/containers.txt |   10 +++++-----
 include/linux/container.h    |    2 +-
 kernel/container.c           |    4 ++--
 kernel/cpuset.c              |   20 ++++++++++++++++++--
 4 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/Documentation/containers.txt b/Documentation/containers.txt
index 28c9e10..9fdb808 100644
--- a/Documentation/containers.txt
+++ b/Documentation/containers.txt
@@ -514,12 +514,12 @@ include/linux/container.h for details).  Note that although this
 method can return an error code, the error code is currently not
 always handled well.
 
-void auto_setup(struct container_subsys *ss, struct container *cont)
+void post_clone(struct container_subsys *ss, struct container *cont)
 
-Called at container_clone() to do any paramater initialization
-which might be required before a task could attach.  For example
-in cpusets, no task may attach before 'cpus' and 'mems' are
-set up.
+Called at the end of container_clone() to do any paramater
+initialization which might be required before a task could attach.  For
+example in cpusets, no task may attach before 'cpus' and 'mems' are set
+up.
 
 void bind(struct container_subsys *ss, struct container *root)
 LL=callback_mutex
diff --git a/include/linux/container.h b/include/linux/container.h
index d809b41..1a83913 100644
--- a/include/linux/container.h
+++ b/include/linux/container.h
@@ -213,7 +213,7 @@ struct container_subsys {
 	void (*exit)(struct container_subsys *ss, struct task_struct *task);
 	int (*populate)(struct container_subsys *ss,
 			struct container *cont);
-	void (*auto_setup)(struct container_subsys *ss, struct container *cont);
+	void (*post_clone)(struct container_subsys *ss, struct container *cont);
 	void (*bind)(struct container_subsys *ss, struct container *root);
 	int subsys_id;
 	int active;
diff --git a/kernel/container.c b/kernel/container.c
index e0793f4..11e326a 100644
--- a/kernel/container.c
+++ b/kernel/container.c
@@ -2400,8 +2400,8 @@ int container_clone(struct task_struct *tsk, struct container_subsys *subsys)
 
 	/* do any required auto-setup */
 	for_each_subsys(root, ss) {
-		if (ss->auto_setup)
-			ss->auto_setup(ss, child);
+		if (ss->post_clone)
+			ss->post_clone(ss, child);
 	}
 
 	/* All seems fine. Finish by moving the task into the new container */
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index ff01aaa..ecefb1d 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1189,7 +1189,23 @@ int cpuset_populate(struct container_subsys *ss, struct container *cont)
 	return 0;
 }
 
-void cpuset_auto_setup(struct container_subsys *ss,
+/*
+ * post_clone() is called at the end of container_clone().
+ * 'container' was just created automatically as a result of
+ * a container_clone(), and the current task is about to
+ * be moved into 'container'.
+ *
+ * Currently we refuse to set up the container - thereby
+ * refusing the task to be entered, and as a result refusing
+ * the sys_unshare() or clone() which initiated it - if any
+ * sibling cpusets have exclusive cpus or mem.
+ *
+ * If this becomes a problem for some users who wish to
+ * allow that scenario, then cpuset_post_clone() could be
+ * changed to grant parent->cpus_allowed-sibling_cpus_exclusive
+ * (and likewise for mems) to the new container.
+ */
+void cpuset_post_clone(struct container_subsys *ss,
 		struct container *container)
 {
 	struct container *parent, *child;
@@ -1269,7 +1285,7 @@ struct container_subsys cpuset_subsys = {
 	.can_attach = cpuset_can_attach,
 	.attach = cpuset_attach,
 	.populate = cpuset_populate,
-	.auto_setup = cpuset_auto_setup,
+	.post_clone = cpuset_post_clone,
 	.subsys_id = cpuset_subsys_id,
 	.early_init = 1,
 };
-- 
1.5.1.1.GIT

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/