linux-kernel - Re: [ 032/173] cgroup: cgroup_subsys->fork() should be called after the task is added to css

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 01 Jan 2013 22:31:55 +0900
From:	Satoru Takeuchi <satoru.takeuchi@...il.com>
To:	Ben Hutchings <ben@...adent.org.uk>
Cc:	linux-kernel@...r.kernel.org, stable@...r.kernel.org,
	akpm@...ux-foundation.org, alan@...rguk.ukuu.org.uk,
	Tejun Heo <tj@...nel.org>, Oleg Nesterov <oleg@...hat.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: [ 032/173] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set

Hi Ben,

At Fri, 28 Dec 2012 20:04:02 +0100,
Ben Hutchings wrote:
> 
> 3.2-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Tejun Heo <tj@...nel.org>
> 
> commit 5edee61edeaaebafe584f8fb7074c1ef4658596b upstream.
> 
> cgroup core has a bug which violates a basic rule about event
> notifications - when a new entity needs to be added, you add that to
> the notification list first and then make the new entity conform to
> the current state.  If done in the reverse order, an event happening
> inbetween will be lost.
> 
> cgroup_subsys->fork() is invoked way before the new task is added to
> the css_set.  Currently, cgroup_freezer is the only user of ->fork()
> and uses it to make new tasks conform to the current state of the
> freezer.  If FROZEN state is requested while fork is in progress
> between cgroup_fork_callbacks() and cgroup_post_fork(), the child
> could escape freezing - the cgroup isn't frozen when ->fork() is
> called and the freezer couldn't see the new task on the css_set.
> 
> This patch moves cgroup_subsys->fork() invocation to
> cgroup_post_fork() after the new task is added to the css_set.
> cgroup_fork_callbacks() is removed.
> 
> Because now a task may be migrated during cgroup_subsys->fork(),
> freezer_fork() is updated so that it adheres to the usual RCU locking
> and the rather pointless comment on why locking can be different there
> is removed (if it doesn't make anything simpler, why even bother?).
> 
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Cc: Oleg Nesterov <oleg@...hat.com>
> Cc: Rafael J. Wysocki <rjw@...k.pl>
> [bwh: Backported to 3.2:
>  - Adjust context
>  - Iterate over first CGROUP_BUILTIN_SUBSYS_COUNT elements of subsys
>  - cgroup_subsys::fork takes cgroup_subsys pointer as first parameter]
> Signed-off-by: Ben Hutchings <ben@...adent.org.uk>

I failed to compile 3.2.36-rc1 with my x86_64 box with enabling cgroup.

build log:
===============================================================================
...
  CC      kernel/cgroup.o
kernel/cgroup.c: In function ‘cgroup_post_fork’:
kernel/cgroup.c:4540:5: warning: passing argument 1 of ‘ss->fork’ from incompatible pointer type [enabled by default]
kernel/cgroup.c:4540:5: note: expected ‘struct cgroup_subsys *’ but argument is of type ‘struct task_struct *’
kernel/cgroup.c:4540:5: error: too few arguments to function ‘ss->fork’
make[2]: *** [kernel/cgroup.o] Error 1
make[1]: *** [kernel] Error 2
make[1]: Leaving directory `/home/sat/src/linux-stable'
make: *** [debian/stamp/build/kernel] Error 2
===============================================================================

It comes from the ss->fork()'s API change introduced by commit 761b3ef50e1c2.

> @@ -4551,7 +4529,21 @@ void cgroup_post_fork(struct task_struct
>  		task_unlock(child);
>  		write_unlock(&css_set_lock);
>  	}
> +
> +	/*
> +	 * Call ss->fork().  This must happen after @child is linked on
> +	 * css_set; otherwise, @child might change state between ->fork()
> +	 * and addition to css_set.
> +	 */
> +	if (need_forkexit_callback) {
> +		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
> +			struct cgroup_subsys *ss = subsys[i];
> +			if (ss->fork)
> +				ss->fork(child);

I found you mentioned this difference in the desctiption as follows.

>  - cgroup_subsys::fork takes cgroup_subsys pointer as first parameter]

I guess you attached the wrong patch, or you forgot to modify the
original patch. Here is the correct patch for the latter case. It just
change "ss->fork(child)" to "ss->fork(ss, child)" as your description.

Thanks,
Satoru

---
From: Tejun Heo <tj@...nel.org>
Date: Tue, 16 Oct 2012 15:03:14 -0700
Subject: cgroup: cgroup_subsys->fork() should be called after the task is
 added to css_set

commit 5edee61edeaaebafe584f8fb7074c1ef4658596b upstream.

cgroup core has a bug which violates a basic rule about event
notifications - when a new entity needs to be added, you add that to
the notification list first and then make the new entity conform to
the current state.  If done in the reverse order, an event happening
inbetween will be lost.

cgroup_subsys->fork() is invoked way before the new task is added to
the css_set.  Currently, cgroup_freezer is the only user of ->fork()
and uses it to make new tasks conform to the current state of the
freezer.  If FROZEN state is requested while fork is in progress
between cgroup_fork_callbacks() and cgroup_post_fork(), the child
could escape freezing - the cgroup isn't frozen when ->fork() is
called and the freezer couldn't see the new task on the css_set.

This patch moves cgroup_subsys->fork() invocation to
cgroup_post_fork() after the new task is added to the css_set.
cgroup_fork_callbacks() is removed.

Because now a task may be migrated during cgroup_subsys->fork(),
freezer_fork() is updated so that it adheres to the usual RCU locking
and the rather pointless comment on why locking can be different there
is removed (if it doesn't make anything simpler, why even bother?).

Signed-off-by: Tejun Heo <tj@...nel.org>
Cc: Oleg Nesterov <oleg@...hat.com>
Cc: Rafael J. Wysocki <rjw@...k.pl>
[bwh: Backported to 3.2:
 - Adjust context
 - Iterate over first CGROUP_BUILTIN_SUBSYS_COUNT elements of subsys
 - cgroup_subsys::fork takes cgroup_subsys pointer as first parameter]
Signed-off-by: Ben Hutchings <ben@...adent.org.uk>
Signed-off-by: Satoru Takeuchi <satoru.takeuchi@...il.com>
---
 include/linux/cgroup.h  |    1 -
 kernel/cgroup.c         |   62 +++++++++++++++++++++++------------------------
 kernel/cgroup_freezer.c |   13 +++-------
 kernel/fork.c           |    9 +------
 4 files changed, 35 insertions(+), 50 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -32,7 +32,6 @@ extern int cgroup_lock_is_held(void);
 extern bool cgroup_lock_live_group(struct cgroup *cgrp);
 extern void cgroup_unlock(void);
 extern void cgroup_fork(struct task_struct *p);
-extern void cgroup_fork_callbacks(struct task_struct *p);
 extern void cgroup_post_fork(struct task_struct *p);
 extern void cgroup_exit(struct task_struct *p, int run_callbacks);
 extern int cgroupstats_build(struct cgroupstats *stats,
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4508,41 +4508,19 @@ void cgroup_fork(struct task_struct *chi
 }
 
 /**
- * cgroup_fork_callbacks - run fork callbacks
- * @child: the new task
- *
- * Called on a new task very soon before adding it to the
- * tasklist. No need to take any locks since no-one can
- * be operating on this task.
- */
-void cgroup_fork_callbacks(struct task_struct *child)
-{
-	if (need_forkexit_callback) {
-		int i;
-		/*
-		 * forkexit callbacks are only supported for builtin
-		 * subsystems, and the builtin section of the subsys array is
-		 * immutable, so we don't need to lock the subsys array here.
-		 */
-		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
-			struct cgroup_subsys *ss = subsys[i];
-			if (ss->fork)
-				ss->fork(ss, child);
-		}
-	}
-}
-
-/**
  * cgroup_post_fork - called on a new task after adding it to the task list
  * @child: the task in question
  *
- * Adds the task to the list running through its css_set if necessary.
- * Has to be after the task is visible on the task list in case we race
- * with the first call to cgroup_iter_start() - to guarantee that the
- * new task ends up on its list.
+ * Adds the task to the list running through its css_set if necessary and
+ * call the subsystem fork() callbacks.  Has to be after the task is
+ * visible on the task list in case we race with the first call to
+ * cgroup_iter_start() - to guarantee that the new task ends up on its
+ * list.
  */
 void cgroup_post_fork(struct task_struct *child)
 {
+	int i;
+
 	if (use_task_css_set_links) {
 		write_lock(&css_set_lock);
 		task_lock(child);
@@ -4551,7 +4529,21 @@ void cgroup_post_fork(struct task_struct
 		task_unlock(child);
 		write_unlock(&css_set_lock);
 	}
+
+	/*
+	 * Call ss->fork().  This must happen after @child is linked on
+	 * css_set; otherwise, @child might change state between ->fork()
+	 * and addition to css_set.
+	 */
+	if (need_forkexit_callback) {
+		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys *ss = subsys[i];
+			if (ss->fork)
+				ss->fork(ss, child);
+		}
+	}
 }
+
 /**
  * cgroup_exit - detach cgroup from exiting task
  * @tsk: pointer to task_struct of exiting process
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -197,23 +197,15 @@ static void freezer_fork(struct cgroup_s
 {
 	struct freezer *freezer;
 
-	/*
-	 * No lock is needed, since the task isn't on tasklist yet,
-	 * so it can't be moved to another cgroup, which means the
-	 * freezer won't be removed and will be valid during this
-	 * function call.  Nevertheless, apply RCU read-side critical
-	 * section to suppress RCU lockdep false positives.
-	 */
 	rcu_read_lock();
 	freezer = task_freezer(task);
-	rcu_read_unlock();
 
 	/*
 	 * The root cgroup is non-freezable, so we can skip the
 	 * following check.
 	 */
 	if (!freezer->css.cgroup->parent)
-		return;
+		goto out;
 
 	spin_lock_irq(&freezer->lock);
 	BUG_ON(freezer->state == CGROUP_FROZEN);
@@ -221,7 +213,10 @@ static void freezer_fork(struct cgroup_s
 	/* Locking avoids race with FREEZING -> THAWED transitions. */
 	if (freezer->state == CGROUP_FREEZING)
 		freeze_task(task, true);
+
 	spin_unlock_irq(&freezer->lock);
+out:
+	rcu_read_unlock();
 }
 
 /*
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1057,7 +1057,6 @@ static struct task_struct *copy_process(
 {
 	int retval;
 	struct task_struct *p;
-	int cgroup_callbacks_done = 0;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1312,12 +1311,6 @@ static struct task_struct *copy_process(
 	p->group_leader = p;
 	INIT_LIST_HEAD(&p->thread_group);
 
-	/* Now that the task is set up, run cgroup callbacks if
-	 * necessary. We need to run them before the task is visible
-	 * on the tasklist. */
-	cgroup_fork_callbacks(p);
-	cgroup_callbacks_done = 1;
-
 	/* Need tasklist lock for parent etc handling! */
 	write_lock_irq(&tasklist_lock);
 
@@ -1419,7 +1412,7 @@ bad_fork_cleanup_cgroup:
 #endif
 	if (clone_flags & CLONE_THREAD)
 		threadgroup_fork_read_unlock(current);
-	cgroup_exit(p, cgroup_callbacks_done);
+	cgroup_exit(p, 0);
 	delayacct_tsk_free(p);
 	module_put(task_thread_info(p)->exec_domain->module);
 bad_fork_cleanup_count:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/