linux-kernel - Re: cgroup: status-quo and userland efforts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51D08976.6040005@redhat.com>
Date:	Sun, 30 Jun 2013 21:39:34 +0200
From:	Lennart Poettering <lpoetter@...hat.com>
To:	Tim Hockin <thockin@...kin.org>
CC:	Michal Hocko <mhocko@...e.cz>, Tejun Heo <tj@...nel.org>,
	Mike Galbraith <bitbucket@...ine.de>,
	Li Zefan <lizefan@...wei.com>,
	Containers <containers@...ts.linux-foundation.org>,
	Cgroups <cgroups@...r.kernel.org>,
	bsingharora <bsingharora@...il.com>,
	"dhaval.giani" <dhaval.giani@...il.com>,
	Kay Sievers <kay.sievers@...y.org>,
	jpoimboe <jpoimboe@...hat.com>,
	"Daniel P. Berrange" <berrange@...hat.com>,
	workman-devel <workman-devel@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: cgroup: status-quo and userland efforts

Heya,

On 29.06.2013 05:05, Tim Hockin wrote:
> Come on, now, Lennart.  You put a lot of words in my mouth.

>> I for sure am not going to make the PID 1 a client of another daemon. That's
>> just wrong. If you have a daemon that is both conceptually the manager of
>> another service and the client of that other service, then that's bad design
>> and you will easily run into deadlocks and such. Just think about it: if you
>> have some external daemon for managing cgroups, and you need cgroups for
>> running external daemons, how are you going to start the external daemon for
>> managing cgroups? Sure, you can hack around this, make that daemon special,
>> and magic, and stuff -- or you can just not do such nonsense. There's no
>> reason to repeat the fuckup that cgroup became in kernelspace a second time,
>> but this time in userspace, with multiple manager daemons all with different
>> and slightly incompatible definitions what a unit to manage actualy is...
>
> I forgot about the tautology of systemd.  systemd is monolithic.

systemd is certainly not monolithic for almost any definition of that 
term. I am not sure where you are taking that from, and I am not sure I 
want to discuss on that level. This just sounds like FUD you picked up 
somewhere and are repeating carelessly...

> But that's not my point.  It seems pretty easy to make this cgroup
> management (in "native mode") a library that can have either a thin
> veneer of a main() function, while also being usable by systemd.  The
> point is to solve all of the problems ONCE.  I'm trying to make the
> case that systemd itself should be focusing on features and policies
> and awesome APIs.

You know, getting this all right isn't easy. If you want to do things 
properly, then you need to propagate attribute changes between the units 
you manage. You also need something like a scheduler, since a number of 
controllers can only be configured under certain external conditions 
(for example: the blkio or devices controller use major/minor parameters 
for configuring per-device limits. Since major/minor assignments are 
pretty much unpredictable these days -- and users probably want to 
configure things with friendly and stable /dev/disk/by-id/* symlinks 
anyway -- this requires us to wait for devices to show up before we can 
configure the parameters.) Soo... you need a graph of units, where you 
can propagate things, and schedule things based on some execution/event 
queue. And the propagation and scheduling are closely intermingled.

Now, that's pretty much exactly what systemd actually *is*. It 
implements a graph of units with a scheduler. And if you rip that part 
out of systemd to make this an "easy cgroup management library", then 
you simply turn what systemd is into a library without leaving anything. 
Which is just bogus.

So no, if you say "seems pretty easy to make this cgroup management a 
library" then well, I have to disagree with you.

>> We want to run fewer, simpler things on our systems, we want to reuse as
>
> Fewer and simpler are not compatible, unless you are losing
> functionality.  Systemd is fewer, but NOT simpler.

Oh, certainly it is. If we'd split up the cgroup fs access into 
separate daemon of some kind, then we'd need some kind of IPC for that, 
and so you have more daemons and you have some complex IPC between the 
processes. So yeah, the systemd approach is certainly both simpler and 
uses fewer daemons then your hypothetical one.

>> much of the code as we can. You don't achieve that by running yet another
>> daemon that does worse what systemd can anyway do simpler, easier and
>> better.
>
> Considering this is all hypothetical, I find this to be a funny
> debate.  My hypothetical idea is better than your hypothetical idea.

Well, systemd is pretty real, and the code to do the unified cgroup 
management within systemd is pretty complete. systemd is certainly not 
hypothetical.

>> The least you could grant us is to have a look at the final APIs we will
>> have to offer before you already imply that systemd cannot be a valid
>> implementation of any API people could ever agree on.
>
> Whoah, don't get defensive.  I said nothing of the sort.  The fact of
> the matter is that we do not run systemd, at least in part because of
> the monolithic nature.  That's unlikely to change in this timescale.

Oh, my. I am not sure what makes you think it is monolithic.

> What I said was that it would be a shame if we had to invent our own
> low-level cgroup daemon just because the "upstream" daemons was too
> tightly coupled with systemd.

I have no interest to reimplement systemd as a library, just to make you 
happy... I am quite happy with what we already have....

> This is supposed to be collaborative, not combative.

It certainly sounds *very* differently in what you are writing.

Lennart
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/