linux-kernel - Re: cgroup: status-quo and userland efforts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAAKZws0xqQ_RAWQX4876fybO3CNHhbiDmkOR5YV7nq5NF8X_Q@mail.gmail.com>
Date:	Sun, 30 Jun 2013 23:06:18 -0700
From:	Tim Hockin <thockin@...kin.org>
To:	Lennart Poettering <lpoetter@...hat.com>
Cc:	Michal Hocko <mhocko@...e.cz>, Tejun Heo <tj@...nel.org>,
	Mike Galbraith <bitbucket@...ine.de>,
	Li Zefan <lizefan@...wei.com>,
	Containers <containers@...ts.linux-foundation.org>,
	Cgroups <cgroups@...r.kernel.org>,
	bsingharora <bsingharora@...il.com>,
	"dhaval.giani" <dhaval.giani@...il.com>,
	Kay Sievers <kay.sievers@...y.org>,
	jpoimboe <jpoimboe@...hat.com>,
	"Daniel P. Berrange" <berrange@...hat.com>,
	workman-devel <workman-devel@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: cgroup: status-quo and userland efforts

On Sun, Jun 30, 2013 at 12:39 PM, Lennart Poettering
<lpoetter@...hat.com> wrote:
> Heya,
>
>
> On 29.06.2013 05:05, Tim Hockin wrote:
>>
>> Come on, now, Lennart.  You put a lot of words in my mouth.
>
>
>>> I for sure am not going to make the PID 1 a client of another daemon.
>>> That's
>>> just wrong. If you have a daemon that is both conceptually the manager of
>>> another service and the client of that other service, then that's bad
>>> design
>>> and you will easily run into deadlocks and such. Just think about it: if
>>> you
>>> have some external daemon for managing cgroups, and you need cgroups for
>>> running external daemons, how are you going to start the external daemon
>>> for
>>> managing cgroups? Sure, you can hack around this, make that daemon
>>> special,
>>> and magic, and stuff -- or you can just not do such nonsense. There's no
>>> reason to repeat the fuckup that cgroup became in kernelspace a second
>>> time,
>>> but this time in userspace, with multiple manager daemons all with
>>> different
>>> and slightly incompatible definitions what a unit to manage actualy is...
>>
>>
>> I forgot about the tautology of systemd.  systemd is monolithic.
>
>
> systemd is certainly not monolithic for almost any definition of that term.
> I am not sure where you are taking that from, and I am not sure I want to
> discuss on that level. This just sounds like FUD you picked up somewhere and
> are repeating carelessly...

It does a number of sort-of-related things.  Maybe it does them better
by doing them together.  I can't say, really.  We don't use it at
work, and I am on Ubuntu elsewhere, for now.

>> But that's not my point.  It seems pretty easy to make this cgroup
>> management (in "native mode") a library that can have either a thin
>> veneer of a main() function, while also being usable by systemd.  The
>> point is to solve all of the problems ONCE.  I'm trying to make the
>> case that systemd itself should be focusing on features and policies
>> and awesome APIs.
>
> You know, getting this all right isn't easy. If you want to do things
> properly, then you need to propagate attribute changes between the units you
> manage. You also need something like a scheduler, since a number of
> controllers can only be configured under certain external conditions (for
> example: the blkio or devices controller use major/minor parameters for
> configuring per-device limits. Since major/minor assignments are pretty much
> unpredictable these days -- and users probably want to configure things with
> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> wait for devices to show up before we can configure the parameters.) Soo...
> you need a graph of units, where you can propagate things, and schedule
> things based on some execution/event queue. And the propagation and
> scheduling are closely intermingled.

I'm really just talking about the most basic low-level substrate of
writing to cgroupfs.  Again, we don't use udev (yet?) so we don't have
these problems.  It seems to me that it's possible to formulate a
bottom layer that is usable by both systemd and non-systemd systems.
But, you know, maybe I am wrong and our internal universe is so much
simpler (and behind the times) than the rest of the world that
layering can work for us and not you.

> Now, that's pretty much exactly what systemd actually *is*. It implements a
> graph of units with a scheduler. And if you rip that part out of systemd to
> make this an "easy cgroup management library", then you simply turn what
> systemd is into a library without leaving anything. Which is just bogus.
>
> So no, if you say "seems pretty easy to make this cgroup management a
> library" then well, I have to disagree with you.
>
>
>>> We want to run fewer, simpler things on our systems, we want to reuse as
>>
>>
>> Fewer and simpler are not compatible, unless you are losing
>> functionality.  Systemd is fewer, but NOT simpler.
>
>
> Oh, certainly it is. If we'd split up the cgroup fs access into separate
> daemon of some kind, then we'd need some kind of IPC for that, and so you
> have more daemons and you have some complex IPC between the processes. So
> yeah, the systemd approach is certainly both simpler and uses fewer daemons
> then your hypothetical one.

Well, it SOUNDS like Serge is trying to develop this to demonstrate
that a standalone daemon works.  That's what I am keen to help with
(or else we have to invent ourselves).  I am not really afraid of IPC
or of "more daemons".  I much prefer simple agents doing one thing and
interacting with each other in simple ways.  But that's me.

>>> much of the code as we can. You don't achieve that by running yet another
>>> daemon that does worse what systemd can anyway do simpler, easier and
>>> better.
>>
>>
>> Considering this is all hypothetical, I find this to be a funny
>> debate.  My hypothetical idea is better than your hypothetical idea.
>
>
> Well, systemd is pretty real, and the code to do the unified cgroup
> management within systemd is pretty complete. systemd is certainly not
> hypothetical.

Fair enough - I did not realize you had already done all the work that
Serge is just starting out on.

>>> The least you could grant us is to have a look at the final APIs we will
>>> have to offer before you already imply that systemd cannot be a valid
>>> implementation of any API people could ever agree on.
>>
>>
>> Whoah, don't get defensive.  I said nothing of the sort.  The fact of
>> the matter is that we do not run systemd, at least in part because of
>> the monolithic nature.  That's unlikely to change in this timescale.
>
>
> Oh, my. I am not sure what makes you think it is monolithic.

It is not a replacement for any one thing.  It is a replacement for a
handful of things that we are not keen to change all at once.  That's
all.  I have not personally looked at what subsystems are able to be
compiled-out so we could do an incremental changeover, though, so
maybe it can work in different modes?  I don't know.  I am not
pursuing this anyway, so I am not the person to convince, regardless.

>> What I said was that it would be a shame if we had to invent our own
>> low-level cgroup daemon just because the "upstream" daemons was too
>> tightly coupled with systemd.
>
>
> I have no interest to reimplement systemd as a library, just to make you
> happy... I am quite happy with what we already have....
>
>
>> This is supposed to be collaborative, not combative.
>
>
> It certainly sounds *very* differently in what you are writing.

Sorry, then.  No offense intended.  I'm just looking for opportunities
to not-replicate work, if this whole model is going to be thrust upon
me.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/