linux-kernel - Re: [PATCH/RFC] Have sane default values for cpusets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100512190757.GA421@tango.0pointer.de>
Date:	Wed, 12 May 2010 21:07:57 +0200
From:	Lennart Poettering <mzxreary@...inter.de>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Dhaval Giani <dhaval.giani@...il.com>,
	James Kosin <jkosin@...comgrp.com>,
	linux-kernel@...r.kernel.org, menage@...gle.com,
	balbir@...ux.vnet.ibm.com, jsafrane@...hat.com, tglx@...utronix.de
Subject: Re: [PATCH/RFC] Have sane default values for cpusets

On Wed, 12.05.10 16:20, Peter Zijlstra (peterz@...radead.org) wrote:

> 
> On Wed, 2010-05-12 at 16:13 +0200, Dhaval Giani wrote:
> > What you are saying is that an application
> > programmer who wants to just use memory cgroups should also care about
> > cpusets and just about countless other cgroup subsystems that can
> > exist. 
> 
> That's exactly what he says if he mounts them together.

Well, this is not realistic.

See Dhaval's patch on the background of systemd
(http://0pointer.de/blog/projects/systemd.html). When a service is
started in systemd, we create a cgroup for it, when it ends, we remove
it. While systemd does that to keep track of processes this has the nice
side effect that all services are properly (and without races) sorted
into different groups: if you start apache, then you get it into its own
group, if you started cups, you get your own group for that -- without
further configuration. Now, while the main reason to do that is for
keeping track of processes this is also useful to actually enforce
limits and suchlike on those groups and hence services. An admin can
choose to enforce limits on the groups systemd creates
for him, because most likely the grouping systemd does along service
lines is the one that matters the most.

I am not interested to make systemd aware of each and every controller
that exists and will exist in the future and encode specific inheritance
rules for them. That is simply not possible, we'd have to add a lot of
logic to systemd I simply don't want to maintain there, and I'd have to
constantly play catch-up with every controller that is added to the
kernel. However, if I don't have that in systemd, as it stands now and
an admin tells systemd to duplicate its groups tree in the cpuset
hierarchy, then systemd would fail to work. And that is not acceptable.

So, just for once, see this from the perspective of the people using
your code: if admins want to piggybick resource limiting onto the normal
systemd cgroup tree, then you make that impossible by having weird
inheritance rules that systemd would first have to learn. (And I am
sorry, but I refuse to teach those rules to systemd, anyway)

What I am arguing here is basically that it is really important to allow
userspace code to create groups in hierarchies where controllers are
active that the userspace code does not know.

Also, it's completely stupid anyway to ask userspace code to implement
inheritance rules for each cgroup controller, if that algorithm could
just as well with minimal work be implemented on the kernel side for free,
and then allows userspace to simply rely on "mkdir" to result in a
working subgroup.

Or let me say this with other words: if an "mkdir" is not enough to
create a working sub-cgroup then libcgroup would have to learn the
necessary inheritance rules and how to copy group params from the parent
to the child -- and that for each and every controller that exists and
will exist. If a new controller is added you'd have to patch libcgroup
and the kernel and make sure they always stay in sync. And that's just
crappy design, if you ask me, and doesn't scale.

Lennart
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/