lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120914135830.GB6221@redhat.com>
Date:	Fri, 14 Sep 2012 09:58:30 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	"Daniel P. Berrange" <berrange@...hat.com>
Cc:	Tejun Heo <tj@...nel.org>, containers@...ts.linux-foundation.org,
	cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
	Neil Horman <nhorman@...driver.com>,
	Michal Hocko <mhocko@...e.cz>,
	Paul Mackerras <paulus@...ba.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Johannes Weiner <hannes@...xchg.org>,
	Thomas Graf <tgraf@...g.ch>,
	"Serge E. Hallyn" <serue@...ibm.com>, Paul Turner <pjt@...gle.com>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: [RFC] cgroup TODOs

On Fri, Sep 14, 2012 at 10:10:32AM +0100, Daniel P. Berrange wrote:

[..]
> > 6. Multiple hierarchies
> > 
> >   Apart from the apparent wheeeeeeeeness of it (I think I talked about
> >   that enough the last time[1]), there's a basic problem when more
> >   than one controllers interact - it's impossible to define a resource
> >   group when more than two controllers are involved because the
> >   intersection of different controllers is only defined in terms of
> >   tasks.
> > 
> >   IOW, if an entity X is of interest to two controllers, there's no
> >   way to map X to the cgroups of the two controllers.  X may belong to
> >   A and B when viewed by one task but A' and B when viewed by another.
> >   This already is a head scratcher in writeback where blkcg and memcg
> >   have to interact.
> > 
> >   While I am pushing for unified hierarchy, I think it's necessary to
> >   have different levels of granularities depending on controllers
> >   given that nesting involves significant overhead and noticeable
> >   controller-dependent behavior changes.
> > 
> >   Solution:
> > 
> >   I think a unified hierarchy with the ability to ignore subtrees
> >   depending on controllers should work.  For example, let's assume the
> >   following hierarchy.
> > 
> >           R
> > 	/   \
> >        A     B
> >       / \
> >      AA AB
> > 
> >   All controllers are co-mounted.  There is per-cgroup knob which
> >   controls which controllers nest beyond it.  If blkio doesn't want to
> >   distinguish AA and AB, the user can specify that blkio doesn't nest
> >   beyond A and blkio would see the tree as,
> > 
> >           R
> > 	/   \
> >        A     B
> > 
> >   While other controllers keep seeing the original tree.  The exact
> >   form of interface, I don't know yet.  It could be a single file
> >   which the user echoes [-]controller name into it or per-controller
> >   boolean file.
> > 
> >   I think this level of flexibility should be enough for most use
> >   cases.  If someone disagrees, please voice your objections now.

Tejun, Daniel,

I am little concerned about above and wondering how systemd and libvirt
will interact and behave out of the box.

Currently systemd does not create its own hierarchy under blkio and
libvirt does. So putting all together means there is no way to avoid
the overhead of systemd created hierarchy.

\
|
+- system
     |
     +- libvirtd.service
              |
              +- virt-machine1
              +- virt-machine2

So there is now way to avoid the overhead of two levels of hierarchy
created by systemd. I really wish that systemd gets rid of "system"
cgroup and puts services directly in top level group. Creating deeper
hieararchices is expensive.

I just want to mention it clearly that with above model, it will not
be possible for libvirt to avoid hierarchy levels created by systemd.
So solution would be to keep depth of hierarchy as low as possible and
to keep controller overhead as low as possible.

Now I know that with blkio idling kills performance. So one solution
could be that on anything fast, don't use CFQ. Use deadline and then
group idling overhead goes away and tools like systemd and libvirt don't
have to worry about keeping track of disks and what scheduler is running.
They don't want to do it and expect kernel to get it right.

But getting that right out of box does not happen as of today as CFQ
is default on everything. Distributions can carry their own patches
to do some approximation, but it would be better to have a better
mechanism in kernel to select better IO scheduler out of box for a
storage lun. It is more important now then even since blkio controller
has come into picture.

Above is the scenario I am most worried about where CFQ shows up by default
on all the luns, systemd and libvirt create 4-5 level deep hierarchies
by default and IO performance sucks out of the box. Already CFQ underforms
for fast storage and with group creation problem becomes worse.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ