[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151125163427.GA8139@redhat.com>
Date: Wed, 25 Nov 2015 17:34:27 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Aleksa Sarai <cyphar@...har.com>,
Johannes Weiner <hannes@...xchg.org>,
Li Zefan <lizefan@...wei.com>, Tejun Heo <tj@...nel.org>
Cc: linux-kernel@...r.kernel.org
Subject: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()
Hello,
I am trying to backport cgroup_pids.c and I fail to understand pids_fork()
which does
css = task_get_css(task, pids_cgrp_id);
pids = css_pids(css);
/*
* If the association has changed, we have to revert and reapply the
* charge/uncharge on the wrong hierarchy to the current one. Since
* the association can only change due to an organisation event, its
* okay for us to ignore the limit in this case.
*/
if (pids != old_pids) {
pids_uncharge(old_pids, 1);
pids_charge(pids, 1);
}
But if the association has changed, pids_can_attach() which moved the child
into another cgrp has called pids_uncharge(old_pids) too?
IOW. Suppose that the new child is moved right before cgroup_post_fork() does
for_each_subsys_which(...)
ss->fork(child);
doesn't this mean that after ss->fork() we do the same sequence
pids_uncharge(old_pids, 1);
pids_charge(pids, 1);
twice? Note that threadgroup_change_begin/end depends on CLONE_THREAD.
So we can actually hit WARN_ON() in pids_cancel().
However, we can't simply remove this uncharge/charge afaics. We need this in
case when the parent was moved to another cgroup before cgroup_post_fork(),
and then css_set_move_task() moves the child.
I know almost nothing about cgroups, perhaps I missed something, please
correct me.
If am right. How about the patch below? percpu_down_read() is cheap. And
we can simplify cgroup_pids after this change.
And. We can probably unify cgroup_threadgroup_rwsem and dup_mmap_sem.
Note that if we take cgroup_threadgroup_rwsem for reading if CLONE_THREAD,
otherwise we take another percpu-rwsem in dup_mmap(), dup_mmap_sem.
Or I am totally confused?
Oleg.
--- x/kernel/fork.c
+++ x/kernel/fork.c
@@ -1368,8 +1368,7 @@ static struct task_struct *copy_process(
p->real_start_time = ktime_get_boot_ns();
p->io_context = NULL;
p->audit_context = NULL;
- if (clone_flags & CLONE_THREAD)
- threadgroup_change_begin(current);
+ threadgroup_change_begin(current);
cgroup_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
@@ -1610,8 +1609,7 @@ static struct task_struct *copy_process(
proc_fork_connector(p);
cgroup_post_fork(p, cgrp_ss_priv);
- if (clone_flags & CLONE_THREAD)
- threadgroup_change_end(current);
+ threadgroup_change_end(current);
perf_event_fork(p);
trace_task_newtask(p, clone_flags);
--- x/kernel/cgroup_pids.c
+++ x/kernel/cgroup_pids.c
@@ -243,27 +243,10 @@ static void pids_cancel_fork(struct task
static void pids_fork(struct task_struct *task, void *priv)
{
- struct cgroup_subsys_state *css;
- struct cgroup_subsys_state *old_css = priv;
- struct pids_cgroup *pids;
- struct pids_cgroup *old_pids = css_pids(old_css);
-
- css = task_get_css(task, pids_cgrp_id);
- pids = css_pids(css);
-
- /*
- * If the association has changed, we have to revert and reapply the
- * charge/uncharge on the wrong hierarchy to the current one. Since
- * the association can only change due to an organisation event, its
- * okay for us to ignore the limit in this case.
- */
- if (pids != old_pids) {
- pids_uncharge(old_pids, 1);
- pids_charge(pids, 1);
- }
+ struct cgroup_subsys_state *css = priv;
+ WARN_ON(task_css(task, pids_cgrp_id) != css);
css_put(css);
- css_put(old_css);
}
static void pids_free(struct task_struct *task)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists