[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090121193248.94aecb10.kamezawa.hiroyu@jp.fujitsu.com>
Date: Wed, 21 Jan 2009 19:32:48 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: Paul Menage <menage@...gle.com>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
"nishimura@....nes.nec.co.jp" <nishimura@....nes.nec.co.jp>,
"lizf@...fujitsu.com" <lizf@...fujitsu.com>
Subject: Re: [RFC][PATCH 4/4] cgroup-memcg fix frequent EBUSY at rmdir v2
On Wed, 21 Jan 2009 02:00:56 -0800
Paul Menage <menage@...gle.com> wrote:
> On Tue, Jan 20, 2009 at 2:47 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@...fujitsu.com> wrote:
> > CGRP_NOTIFY_ON_RELEASE,
> > + /* Someone calls rmdir() and is wating for this cgroup is released */
>
> /* A thread is in rmdir() waiting to destroy this cgroup */
>
> Also document that it can only be set/cleared when you're holding the
> inode_sem for the cgroup directory. And we should probably move this
> enum inside cgroup.c, since nothing in the header file uses it.
>
> > + CGRP_WAIT_ON_RMDIR,
> > };
Hmm, ok. move this all enum to cgroup.c ?
>
> >
> > struct cgroup {
> > @@ -350,7 +352,7 @@ int cgroup_is_descendant(const struct cg
> > struct cgroup_subsys {
> > struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss,
> > struct cgroup *cgrp);
> > - void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> > + int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>
> Can you update the documentation to indicate what an error result from
> pre_destroy indicates? Can pre_destroy() be called multiple times for
> the same subsystem/cgroup?
>
yes, after this, memcg will return -EBUSY in some special cases.
(patches are on my stack.)
We'll have -EBUSY situation especially on swap-less system.
> > +
> > + /* wake up rmdir() waiter....it should fail.*/
>
> /* Wake up rmdir() waiter - the rmdir should fail since the cgroup is
> no longer empty */
>
> But is this safe? If we do a pre-destroy, is it OK to let new tasks
> into the cgroup?
>
Current memcg allows it. (so, I removed "obsolete" flag in memcg and asked
you to add css_tryget().)
> > @@ -2446,6 +2461,8 @@ static long cgroup_create(struct cgroup
> >
> > mutex_unlock(&cgroup_mutex);
> > mutex_unlock(&cgrp->dentry->d_inode->i_mutex);
> > + if (wakeup_on_rmdir(parent))
> > + cgroup_rmdir_wakeup_waiters();
>
> I don't think that there can be a waiter, since rmdir() would hold the
> parent's inode semaphore, which would block this thread before it gets
> to cgroup_create()
>
Oh, I see. I missed that. I'll remove this.
> > +DECLARE_WAIT_QUEUE_HEAD(cgroup_rmdir_waitq);
> > +
> > +static void cgroup_rmdir_wakeup_waiters(void)
> > +{
> > + wake_up_all(&cgroup_rmdir_waitq);
> > +}
> > +
>
> I think you can merge wakeup_on_rmdir() and
> cgroup_rmdir_wakeup_waiters() into a single function,
> cgroup_wakeup_rmdir(struct cgroup *)
>
will try.
>
> >
> > + if (signal_pending(current))
> > + return -EINTR;
>
> I think it would be better to move this check to after we've already
> failed on cgroup_clear_css_refs(). That way we can't fail with an
> EINTR just because we raced with a signal on the way into rmdir() - we
> have to actually hit the EBUSY and try to sleep.
Ok, will move.
> > + ret = cgroup_call_pre_destroy(cgrp);
> > + if (ret == -EBUSY)
> > + return -EBUSY;
>
> What about other potential error codes? If the subsystem's only
> allowed to return 0 or EBUSY, then we should check for that.
>
Hmm, subsystem may return -EPERM or some..
I'll change this to
if (!ret)
return ret;
Thank you for review. very helpful.
I'll consider more.
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists