[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120428020003.GA26573@mtj.dyndns.org>
Date: Fri, 27 Apr 2012 19:00:03 -0700
From: Tejun Heo <tj@...nel.org>
To: Hiroyuki Kamezawa <kamezawa.hiroyuki@...il.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
Michal Hocko <mhocko@...e.cz>,
Johannes Weiner <hannes@...xchg.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Glauber Costa <glommer@...allels.com>,
Han Ying <yinghan@...gle.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC][PATCH 8/9 v2] cgroup: avoid creating new cgroup under a
cgroup being destroyed
Hi, KAME.
On Sat, Apr 28, 2012 at 09:20:52AM +0900, Hiroyuki Kamezawa wrote:
> What I thought was...
> Assume a memory cgoup A, with use_hierarchy==1.
>
> 1. thread:0 start calling pre->destroy of cgroup A
> 2. thread:0 it sometimes calls cond_resched or other sleep functions.
> 3. thread:1 create a cgroup B under "A"
> 4. thread:1 attach a thread X to cgroup A/B
> 5. res_counter of A charged up. but pre_destroy() can't find what happens
> because it scans LRU of A.
>
> So, we have -EBUSY now. I considered some options to fix this.
>
> option 1) just return 0 instead of -EBUSY when pre_destroy() finds a
> task or a child.
>
> There is a race....even if we return 0 here and expects cgroup code
> can catch it,
> the thread or a child we found may be moved to other cgroup before we check it
> in cgroup's final check.
> In that case, the cgroup will be freed before full-ack of
> pre_destory() and the charges
> will be lost.
So, cgroup code won't proceed with rmdir if children are created
inbetween and note that the race condition of lost charge you
described above existed before this change - ie. new cgroup could be
created after pre_destroy() is complete.
The current cgroup rmdir code is transitional. It has to support both
retrying and non-retrying pre_destroy()s and that means we can't mark
the cgroup DEAD before starting invoking pre_destroy(); however, we
can do that once memcg's pre_destroy() is converted which will also
remove all the WAIT_ON_RMDIR mechanism and the above described race.
There really isn't much point in trying to make the current cgroup
rmdir behave perfectly when the next step is removing all the fixed up
parts.
So, IMHO, just making pre_destroy() clean up its own charges and
always returning 0 is enough. There's no need to fix up old
non-critical race condition at this point in the patch stream. cgroup
rmdir simplification will make them disappear anyway.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists