linux-kernel - Re: [RFC][PATCH 1/6] memcg: fix pre

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20081210131718.1210360e.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Wed, 10 Dec 2008 13:17:18 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
	"lizf@...fujitsu.com" <lizf@...fujitsu.com>,
	"menage@...gle.com" <menage@...gle.com>,
	"kosaki.motohiro@...fujitsu.com" <kosaki.motohiro@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH 1/6] memcg: fix pre_destory handler

On Wed, 10 Dec 2008 12:03:40 +0900
Daisuke Nishimura <nishimura@....nes.nec.co.jp> wrote:

> On Wed, 10 Dec 2008 11:58:30 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:
> > On Wed, 10 Dec 2008 11:28:15 +0900
> > Daisuke Nishimura <nishimura@....nes.nec.co.jp> wrote:
> > 
> > > On Tue, 9 Dec 2008 20:06:47 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:
> > > > better name for new flag is welcome.
> > > > 
> > > > ==
> > > > Because pre_destroy() handler is moved out to cgroup_lock() for
> > > > avoiding dead-lock, now, cgroup's rmdir() does following sequence.
> > > > 
> > > > 	cgroup_lock()
> > > > 	check children and tasks.
> > > > 	(A)
> > > > 	cgroup_unlock()
> > > > 	(B)
> > > > 	pre_destroy() for subsys;-----(1)
> > > > 	(C)
> > > > 	cgroup_lock();
> > > > 	(D)
> > > > 	Second check:check for -EBUSY again because we released the lock.
> > > > 	(E)
> > > > 	mark cgroup as removed.
> > > > 	(F)
> > > > 	unlink from lists.
> > > > 	cgroup_unlock();
> > > > 	dput()
> > > > 	=> when dentry's refcnt goes down to 0
> > > > 		destroy() handers for subsys
> > > > 
> > > > memcg marks itself as "obsolete" when pre_destroy() is called at (1)
> > > > But rmdir() can fail after pre_destroy(). So marking "obsolete" is bug.
> > > > I'd like to fix sanity of pre_destroy() in cgroup layer.
> > > > 
> > > > Considering above sequence, new tasks can be added while
> > > > 	(B) and (C)
> > > > swap-in recored can be charged back to a cgroup after pre_destroy()
> > > > 	at (C) and (D), (E)
> > > > (means cgrp's refcnt not comes from task but from other persistent objects.)
> > > > 
> > > > This patch adds "cgroup_is_being_removed()" check. (better name is welcome)
> > > > After this,
> > > > 
> > > > 	- cgroup is marked as CGRP_PRE_REMOVAL at (A)
> > > > 	- If Second check fails, CGRP_PRE_REMOVAL flag is removed.
> > > > 	- memcg's its own obsolete flag is removed.
> > > > 	- While CGROUP_PRE_REMOVAL, task attach will fail by -EBUSY.
> > > > 	  (task attach via clone() will not hit the case.)
> > > > 
> > > > By this, we can trust pre_restroy()'s result.
> > > > 
> > > > 
> > > I agrree to the direction of this patch, but I think it would be better
> > > to split this into cgroup and memcg part.
> > >
> > Hmm, but "showing usage" part is necessary for this kind of patches.
> >  
> I see.
> 
> (snip)
> > > > +	if (cgroup_is_being_removed(cgrp))
> > > > +		return -EBUSY;
> > > > +
> > > >  	for_each_subsys(root, ss) {
> > > >  		if (ss->can_attach) {
> > > >  			retval = ss->can_attach(ss, cgrp, tsk);
> > > > @@ -2469,12 +2481,14 @@ static int cgroup_rmdir(struct inode *un
> > > >  		mutex_unlock(&cgroup_mutex);
> > > >  		return -EBUSY;
> > > >  	}
> > > > -	mutex_unlock(&cgroup_mutex);
> > > >  
> > > >  	/*
> > > >  	 * Call pre_destroy handlers of subsys. Notify subsystems
> > > >  	 * that rmdir() request comes.
> > > >  	 */
> > > > +	set_bit(CGRP_PRE_REMOVAL, &cgrp->flags);
> > > > +	mutex_unlock(&cgroup_mutex);
> > > > +
> > > >  	cgroup_call_pre_destroy(cgrp);
> > > >  
> > > Is there any case where pre_destory is called simultaneusly ?
> > > 
> > I can't catch what is your concern.
> > 
> > AFAIK, vfs_rmdir() is done under mutex to dentry
> > ==
> >  mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
> >  ...
> >  vfs_rmdir()
> > ==
> > And calls to cgroup_rmdir() will be serialized.
> > 
> You're right. I missed that.
> Thank you for your clarification.
> 
> 

I'll post updated one. maybe much clearer.

Thanks,
-Kame



> Thanks,
> Daisuke Nishimura.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/