lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 Mar 2012 09:28:28 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Michal Hocko <mhocko@...e.cz>,
	Johannes Weiner <hannes@...xchg.org>, gthelen@...gle.com,
	Hugh Dickins <hughd@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Vivek Goyal <vgoyal@...hat.com>,
	Jens Axboe <axboe@...nel.dk>, Li Zefan <lizf@...fujitsu.com>,
	containers@...ts.linux-foundation.org, cgroups@...r.kernel.org
Subject: Re: [RFC REPOST] cgroup: removing css reference drain wait during
 cgroup removal

On Tue, 13 Mar 2012 09:39:14 -0700
Tejun Heo <tj@...nel.org> wrote:

> Hello, KAMEZAWA.
> 
> On Tue, Mar 13, 2012 at 03:11:48PM +0900, KAMEZAWA Hiroyuki wrote:
> > The trouble for pre_destroy() is _not_ refcount, Memory cgroup has its own refcnt
> > and use it internally. The problem is 'charges'. It's not related to refcnt.
> 
> Hmmm.... yeah, I'm not familiar with memcg internals at all.  For
> blkcg, refcnt matters but if it doesn't for memcg, great.
> 
> > Cgroup is designed to exists with 'tasks'. But memory may not be related to any
> > task...just related to a cgroup.
> > 
> > But ok, pre_destory() & rmdir() is complicated, I agree.
> > 
> > Now, we prevent rmdir() if we can't move charges to its parent. If pre_destory()
> > shouldn't fail, I can think of some alternatives.
> > 
> >  * move all charges to the parent and if it fails...move all charges to
> >    root cgroup.
> >    (drop_from_memory may not work well in swapless system.)
> 
> I think this one is better and this shouldn't fail if hierarchical
> mode is in use, right?
> 

Right.


> > I think.. if pre_destory() never fails, we don't need pre_destroy().
> 
> For memcg maybe, blkcg still needs it.
> 
> > >   The last one seems more tricky.  On destruction of cgroup, the
> > >   charges are transferred to its parent and the parent may not have
> > >   enough room for that.  Greg told me that this should only be a
> > >   problem for !hierarchical case.  I think this can be dealt with by
> > >   dumping what's left over to root cgroup with a warning message.
> > 
> > I don't like warning ;) 
> 
> I agree this isn't perfect but then again failing rmdir isn't perfect
> either and given that the condition can be wholly avoided in
> hierarchical mode, which should be the default anyway (is there any
> reason to keep flat mode except for backward compatibility?), I don't
> think the trade off is too bad.
> 

One reason is 'performance'. You can see performance trouble when you
creates deep tree of memcgs in hierarchy mode. The deeper memcg tree,
the more res_coutners will be shared.

For example, libvirt creates cgroup tree as

	/cgroup/memory/libvirt/qemu/GuestXXX/....
        /cgroup/memory/libvirt/lxc/GuestXXX/...

No one don't want to count up 4 res_coutner, which is very very heavy,
for handling independent workloads of "Guest".

IIUC, in general, even in the processes are in a tree, in major case
of servers, their workloads are independent.
I think FLAT mode is the dafault. 'heararchical' is a crazy thing which
cannot be managed.


Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ