linux-kernel - Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20110825093958.75b95bd8.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Thu, 25 Aug 2011 09:39:58 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	rjw@...k.pl, paul@...lmenage.org, lizf@...fujitsu.com,
	linux-pm@...ts.linux-foundation.org, linux-kernel@...r.kernel.org,
	containers@...ts.linux-foundation.org,
	Balbir Singh <bsingharora@...il.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	James Morris <jmorris@...ei.org>
Subject: Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in
 subsys->can_attach(), cancel_attach() and attach()

On Wed, 24 Aug 2011 00:19:57 +0200
Tejun Heo <tj@...nel.org> wrote:

> Currently, there's no way to pass multiple tasks to cgroup_subsys
> methods necessitating the need for separate per-process and per-task
> methods.  This patch introduces cgroup_taskset which can be used to
> pass multiple tasks and their associated cgroups to cgroup_subsys
> methods.
> 
> Three methods - can_attach(), cancel_attach() and attach() - are
> converted to use cgroup_taskset.  This unifies passed parameters so
> that all methods have access to all information.  Conversions in this
> patchset are identical and don't introduce any behavior change.
> 
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Cc: Paul Menage <paul@...lmenage.org>
> Cc: Li Zefan <lizf@...fujitsu.com>
> Cc: Balbir Singh <bsingharora@...il.com>
> Cc: Daisuke Nishimura <nishimura@....nes.nec.co.jp>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> Cc: James Morris <jmorris@...ei.org>

Thank you for your work. I welcome this !

Some comments around memcg.

> ---
>  Documentation/cgroups/cgroups.txt |   26 ++++++----
>  include/linux/cgroup.h            |   28 +++++++++-
>  kernel/cgroup.c                   |   99 +++++++++++++++++++++++++++++++++----
>  kernel/cgroup_freezer.c           |    2 +-
>  kernel/cpuset.c                   |   18 ++++---
>  mm/memcontrol.c                   |   16 +++---
>  security/device_cgroup.c          |    7 ++-
>  7 files changed, 153 insertions(+), 43 deletions(-)
> 
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index cd67e90..2eee7cf 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -594,16 +594,21 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be
>  called multiple times against a cgroup.
>  
>  int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> -	       struct task_struct *task)
> +	       struct cgroup_taskset *tset)
>  (cgroup_mutex held by caller)
>  
> -Called prior to moving a task into a cgroup; if the subsystem
> -returns an error, this will abort the attach operation.  If a NULL
> -task is passed, then a successful result indicates that *any*
> -unspecified task can be moved into the cgroup. Note that this isn't
> +Called prior to moving one or more tasks into a cgroup; if the
> +subsystem returns an error, this will abort the attach operation.
> +@...t contains the tasks to be attached and is guaranteed to have at
> +least one task in it. If there are multiple, it's guaranteed that all
> +are from the same thread group,


Do this, "If there are multiple, it's guaranteed that all
are from the same thread group ", means the 'tset' contains
only one mm_struct ?

And is it guaranteed that any task in tset will not be freed while
subsystem routine runs ?

> @tset contains all tasks from the
> +group whether they're actually switching cgroup or not, and the first
> +task is the leader. Each @tset entry also contains the task's old
> +cgroup and tasks which aren't switching cgroup can be skipped easily
> +using the cgroup_taskset_for_each() iterator. Note that this isn't
>  called on a fork. If this method returns 0 (success) then this should
> -remain valid while the caller holds cgroup_mutex and it is ensured that either
> -attach() or cancel_attach() will be called in future.
> +remain valid while the caller holds cgroup_mutex and it is ensured
> +that either attach() or cancel_attach() will be called in future.
>  
>  int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk);
>  (cgroup_mutex held by caller)
> @@ -613,14 +618,14 @@ attached (possibly many when using cgroup_attach_proc). Called after
>  can_attach.
>  
>  void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> -	       struct task_struct *task, bool threadgroup)
> +		   struct cgroup_taskset *tset)
>  (cgroup_mutex held by caller)
>  
>  Called when a task attach operation has failed after can_attach() has succeeded.
>  A subsystem whose can_attach() has some side-effects should provide this
>  function, so that the subsystem can implement a rollback. If not, not necessary.
>  This will be called only about subsystems whose can_attach() operation have
> -succeeded.
> +succeeded. The parameters are identical to can_attach().
>  
>  void pre_attach(struct cgroup *cgrp);
>  (cgroup_mutex held by caller)
> @@ -629,11 +634,12 @@ For any non-per-thread attachment work that needs to happen before
>  attach_task. Needed by cpuset.
>  
>  void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> -	    struct cgroup *old_cgrp, struct task_struct *task)
> +	    struct cgroup_taskset *tset)
>  (cgroup_mutex held by caller)
>  
>  Called after the task has been attached to the cgroup, to allow any
>  post-attachment activity that requires memory allocations or blocking.
> +The parameters are identical to can_attach().
>  
>  void attach_task(struct cgroup *cgrp, struct task_struct *tsk);
>  (cgroup_mutex held by caller)
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index da7e4bc..2470c8e 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -457,6 +457,28 @@ void cgroup_exclude_rmdir(struct cgroup_subsys_state *css);
>  void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css);
>  
>  /*
> + * Control Group taskset, used to pass around set of tasks to cgroup_subsys
> + * methods.
> + */
> +struct cgroup_taskset;
> +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset);
> +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset);
> +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset);
> +int cgroup_taskset_size(struct cgroup_taskset *tset);
> +
> +/**
> + * cgroup_taskset_for_each - iterate cgroup_taskset
> + * @task: the loop cursor
> + * @skip_cgrp: skip if task's cgroup matches this, %NULL to iterate through all
> + * @tset: taskset to iterate
> + */
> +#define cgroup_taskset_for_each(task, skip_cgrp, tset)			\
> +	for ((task) = cgroup_taskset_first((tset)); (task);		\
> +	     (task) = cgroup_taskset_next((tset)))			\
> +		if (!(skip_cgrp) ||					\
> +		    cgroup_taskset_cur_cgroup((tset)) != (skip_cgrp))
> +
> +/*
>   * Control Group subsystem type.
>   * See Documentation/cgroups/cgroups.txt for details
>   */
> @@ -467,14 +489,14 @@ struct cgroup_subsys {
>  	int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  	void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  	int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> -			  struct task_struct *tsk);
> +			  struct cgroup_taskset *tset);
>  	int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
>  	void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> -			      struct task_struct *tsk);
> +			      struct cgroup_taskset *tset);
>  	void (*pre_attach)(struct cgroup *cgrp);
>  	void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
>  	void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> -		       struct cgroup *old_cgrp, struct task_struct *tsk);
> +		       struct cgroup_taskset *tset);
>  	void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
>  	void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp,
>  			struct cgroup *old_cgrp, struct task_struct *task);
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index cf5f3e3..474674b 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1739,11 +1739,85 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
>  }
>  EXPORT_SYMBOL_GPL(cgroup_path);
>  
> +/*
> + * Control Group taskset
> + */
>  struct task_and_cgroup {
>  	struct task_struct	*task;
>  	struct cgroup		*cgrp;
>  };
>  
> +struct cgroup_taskset {
> +	struct task_and_cgroup	single;
> +	struct flex_array	*tc_array;
> +	int			tc_array_len;
> +	int			idx;
> +	struct cgroup		*cur_cgrp;
> +};
> +
> +/**
> + * cgroup_taskset_first - reset taskset and return the first task
> + * @tset: taskset of interest
> + *
> + * @tset iteration is initialized and the first task is returned.
> + */
> +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset)
> +{
> +	if (tset->tc_array) {
> +		tset->idx = 0;
> +		return cgroup_taskset_next(tset);
> +	} else {
> +		tset->cur_cgrp = tset->single.cgrp;
> +		return tset->single.task;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_first);
> +
> +/**
> + * cgroup_taskset_next - iterate to the next task in taskset
> + * @tset: taskset of interest
> + *
> + * Return the next task in @tset.  Iteration must have been initialized
> + * with cgroup_taskset_first().
> + */
> +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset)
> +{
> +	struct task_and_cgroup *tc;
> +
> +	if (!tset->tc_array || tset->idx >= tset->tc_array_len)
> +		return NULL;
> +
> +	tc = flex_array_get(tset->tc_array, tset->idx++);
> +	tset->cur_cgrp = tc->cgrp;
> +	return tc->task;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_next);
> +
> +/**
> + * cgroup_taskset_cur_cgroup - return the matching cgroup for the current task
> + * @tset: taskset of interest
> + *
> + * Return the cgroup for the current (last returned) task of @tset.  This
> + * function must be preceded by either cgroup_taskset_first() or
> + * cgroup_taskset_next().
> + */
> +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset)
> +{
> +	return tset->cur_cgrp;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_cur_cgroup);
> +
> +/**
> + * cgroup_taskset_size - return the number of tasks in taskset
> + * @tset: taskset of interest
> + */
> +int cgroup_taskset_size(struct cgroup_taskset *tset)
> +{
> +	return tset->tc_array ? tset->tc_array_len : 1;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_size);
> +
> +
>  /*
>   * cgroup_task_migrate - move a task from one cgroup to another.
>   *
> @@ -1828,15 +1902,19 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
>  	struct cgroup_subsys *ss, *failed_ss = NULL;
>  	struct cgroup *oldcgrp;
>  	struct cgroupfs_root *root = cgrp->root;
> +	struct cgroup_taskset tset = { };
>  
>  	/* Nothing to do if the task is already in that cgroup */
>  	oldcgrp = task_cgroup_from_root(tsk, root);
>  	if (cgrp == oldcgrp)
>  		return 0;
>  
> +	tset.single.task = tsk;
> +	tset.single.cgrp = oldcgrp;
> +
>  	for_each_subsys(root, ss) {
>  		if (ss->can_attach) {
> -			retval = ss->can_attach(ss, cgrp, tsk);
> +			retval = ss->can_attach(ss, cgrp, &tset);
>  			if (retval) {
>  				/*
>  				 * Remember on which subsystem the can_attach()
> @@ -1867,7 +1945,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
>  		if (ss->attach_task)
>  			ss->attach_task(cgrp, tsk);
>  		if (ss->attach)
> -			ss->attach(ss, cgrp, oldcgrp, tsk);
> +			ss->attach(ss, cgrp, &tset);
>  	}
>  
>  	synchronize_rcu();
> @@ -1889,7 +1967,7 @@ out:
>  				 */
>  				break;
>  			if (ss->cancel_attach)
> -				ss->cancel_attach(ss, cgrp, tsk);
> +				ss->cancel_attach(ss, cgrp, &tset);
>  		}
>  	}
>  	return retval;
> @@ -2005,6 +2083,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>  	struct task_struct *tsk;
>  	struct task_and_cgroup *tc;
>  	struct flex_array *group;
> +	struct cgroup_taskset tset = { };
>  	/*
>  	 * we need to make sure we have css_sets for all the tasks we're
>  	 * going to move -before- we actually start moving them, so that in
> @@ -2067,6 +2146,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>  	} while_each_thread(leader, tsk);
>  	/* remember the number of threads in the array for later. */
>  	group_size = i;
> +	tset.tc_array = group;
> +	tset.tc_array_len = group_size;
>  	rcu_read_unlock();
>  
>  	/* methods shouldn't be called if no task is actually migrating */
> @@ -2079,7 +2160,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>  	 */
>  	for_each_subsys(root, ss) {
>  		if (ss->can_attach) {
> -			retval = ss->can_attach(ss, cgrp, leader);
> +			retval = ss->can_attach(ss, cgrp, &tset);
>  			if (retval) {
>  				failed_ss = ss;
>  				goto out_cancel_attach;
> @@ -2169,10 +2250,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>  	 * being moved, this call will need to be reworked to communicate that.
>  	 */
>  	for_each_subsys(root, ss) {
> -		if (ss->attach) {
> -			tc = flex_array_get(group, 0);
> -			ss->attach(ss, cgrp, tc->cgrp, tc->task);
> -		}
> +		if (ss->attach)
> +			ss->attach(ss, cgrp, &tset);
>  	}
>  
>  	/*
> @@ -2194,11 +2273,11 @@ out_cancel_attach:
>  		for_each_subsys(root, ss) {
>  			if (ss == failed_ss) {
>  				if (cancel_failed_ss && ss->cancel_attach)
> -					ss->cancel_attach(ss, cgrp, leader);
> +					ss->cancel_attach(ss, cgrp, &tset);
>  				break;
>  			}
>  			if (ss->cancel_attach)
> -				ss->cancel_attach(ss, cgrp, leader);
> +				ss->cancel_attach(ss, cgrp, &tset);
>  		}
>  	}
>  out_put_tasks:
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 4e82525..a2b0082 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -159,7 +159,7 @@ static void freezer_destroy(struct cgroup_subsys *ss,
>   */
>  static int freezer_can_attach(struct cgroup_subsys *ss,
>  			      struct cgroup *new_cgroup,
> -			      struct task_struct *task)
> +			      struct cgroup_taskset *tset)
>  {
>  	struct freezer *freezer;
>  
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 10131fd..2e5825b 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1368,10 +1368,10 @@ static int fmeter_getrate(struct fmeter *fmp)
>  }
>  
>  /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
> -static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
> -			     struct task_struct *tsk)
> +static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +			     struct cgroup_taskset *tset)
>  {
> -	struct cpuset *cs = cgroup_cs(cont);
> +	struct cpuset *cs = cgroup_cs(cgrp);
>  
>  	if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
>  		return -ENOSPC;
> @@ -1384,7 +1384,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
>  	 * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may
>  	 * be changed.
>  	 */
> -	if (tsk->flags & PF_THREAD_BOUND)
> +	if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND)
>  		return -EINVAL;
>  
>  	return 0;
> @@ -1434,12 +1434,14 @@ static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk)
>  	cpuset_update_task_spread_flag(cs, tsk);
>  }
>  
> -static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
> -			  struct cgroup *oldcont, struct task_struct *tsk)
> +static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +			  struct cgroup_taskset *tset)
>  {
>  	struct mm_struct *mm;
> -	struct cpuset *cs = cgroup_cs(cont);
> -	struct cpuset *oldcs = cgroup_cs(oldcont);
> +	struct task_struct *tsk = cgroup_taskset_first(tset);
> +	struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset);
> +	struct cpuset *cs = cgroup_cs(cgrp);
> +	struct cpuset *oldcs = cgroup_cs(oldcgrp);
>  
>  	/*
>  	 * Change mm, possibly for multiple threads in a threadgroup. This is
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 930de94..b2802cc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5460,8 +5460,9 @@ static void mem_cgroup_clear_mc(void)
>  
>  static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
>  				struct cgroup *cgroup,
> -				struct task_struct *p)
> +				struct cgroup_taskset *tset)
>  {
> +	struct task_struct *p = cgroup_taskset_first(tset);
>  	int ret = 0;
>  	struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
>  

Ah..hmm. I think this doesn't work as expected for memcg.
Maybe code like this will be required.

{
	struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
        struct mem_cgroup *from = NULL;
	struct task_struct *p;
	int ret = 0;
	/*
 	 * memcg just works against mm-owner. Check mm-owner is in this cgroup.
	 * Because tset contains only one thread-group, we'll find a task of
	 * mm->owner, at most.
 	 */
	for_cgroup_taskset_for_each(task, NULL, tset) {
		struct mm_struct *mm;

		mm = get_task_mm(task);
		if (!mm)
			continue;
		if (mm->owner == task) {
			p = task;
			break;
		}
		mmput(mm);
	}
	if (!p)
		return ret;
        from = mem_cgroup_from_task(p);
        mem_cgroup_start_move(from);
        spin_lock(&mc.lock);
	mc.from = from;
        mc.to = mem;
        spin_unlock(&mc.lock);
        /* We set mc.moving_task later */

        ret = mem_cgroup_precharge_mc(mm);
        if (ret)
             mem_cgroup_clear_mc();
	mm_put(mm);
        return ret;
} 



> @@ -5499,7 +5500,7 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
>  
>  static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
>  				struct cgroup *cgroup,
> -				struct task_struct *p)
> +				struct cgroup_taskset *tset)
>  {
>  	mem_cgroup_clear_mc();
>  }
> @@ -5616,9 +5617,9 @@ retry:
>  
>  static void mem_cgroup_move_task(struct cgroup_subsys *ss,
>  				struct cgroup *cont,
> -				struct cgroup *old_cont,
> -				struct task_struct *p)
> +				struct cgroup_taskset *tset)
>  {
> +	struct task_struct *p = cgroup_taskset_first(tset);
>  	struct mm_struct *mm = get_task_mm(p);
>  

Similar code with can_attach() will be required.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/