[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPweEDwob2vy7yQhD-KkyLDYScMUH2VgqXdOscgrqKCFC-Z3sA@mail.gmail.com>
Date: Mon, 2 Mar 2015 01:48:34 +0000
From: Luke Kenneth Casson Leighton <lkcl@...l.net>
To: David Lang <david@...g.hm>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [proposal] delegating cgroup manager to non-PID1
On Mon, Mar 2, 2015 at 12:13 AM, David Lang <david@...g.hm> wrote:
> On Sun, 1 Mar 2015, Luke Kenneth Casson Leighton wrote:
>
>> in recent discussions about PID-1 alternatives (sysvinit, openrc,
>> systemd, depinit) i was alerted to the idea that PID1 is to become the
>> sole exlcusive process permitted to manage cgroups. given that, just
>> as one specific example, depinit is only around 2,300 lines of c code,
>> adding extra code to manage cgroups is of some concern especially in
>> light of the general UNIX philosophy "do one thing and do it well".
>>
>> to allow the general UNIX philosophy to be honoured, may i
>> respectfully propose an additional linux kernel systemcall which
>> permits delegation - solely and exclusively by PID1 - of the
>> management of cgroups to one (and only one) other process, and that
>> furthermore that the process must be an immediate child of PID1?
>
>
> There is less agreement on the idea that PID1 will have exclusive control
> over cgroups than some of the posts make it seem. There are many people who
> use cgroups for things that PID1 (and systemd) aren't dealing with.
such as lxc, yes - i use it to run 5 virtual machines for clients,
having got absolutely fed up with xen crashing because i was using a
slightly different swap/VM arrangement from usual, and every time a
xen guest hit maximum available memory it would go into 100% cpu
usage, lock up the guest entirely, and sometimes take the entire xen
host with it. this being bad i ripped it out and replaced it with
lxc, which, despite the xen guests being on LVM partitions, worked
very well once i worked out how to specify LVM partitions in the lxc
cgroups device control file.
so... as an end-user i would be rather upset if lxc suddenly got
really awkward to use, or complex because lxc had to be part of PID1.
or was forced to become more complex by having (god forbid) d-bus in
it as it was forced to communicate with PID1.
however i would probably grumble if PID1 had to be responsible for
exec'ing a special lxc-cgroup-management program, might find it a bit
weird, but would otherwise tolerate it.
where i _would_ get upset would be if say the PID1 decided "i'm
sorry, i control cgroups, i'm delegating to this executable, i can't
cope if you try to run a different one, so the fact that you're
running lxc is something i completely disregard, fuck off and die".
that would... err... be bad :)
> The
> issue is that the people working to revamp cgroups are saying that allowing
> other processed to affect cgroups brings up hard problems that they don't
> want to deal with right now, so they want to make cgroups exclusive to PID1
> as a 'temporary' measure, and then look at solving the problems that are
> needed to let other processes manage parts or all of the cgroups config.
well then, controlled and explicit delegation would appear to be
quite a reasonable compromise. and would also make SE/Linux advocates
happy as well. [SE/Linux advocates _really_ don't like the idea of
one process doing multiple jobs, it attracts attention as a
high-priority target for concerted efforts to exploit possible
weaknesses on the basis that doing so is a higher payoff].
the actual issue i feel is that the plans for cgroups (such as solving
the console vt issue [1], and using cgroups to add labels to processes
and their children so that both a parent and all children may be
killed atomically) are, regardless of whether they're controlled
exclusively by PID1, mutually exclusively incompatible with whatever
other uses cgroups are put to (such as lxc).
if you use cgroups for lxc, for example, chances are pretty high that
it'll massively interfere with, or require some careful (i.e.
completely inappropriate) negotiation, with *all* the other uses not
just PID1.
the thought therefore occurs to me - and this is just a first
immediate thought that i haven't had much time to mull over - that it
might make sense to follow the whole "VMS" idea where within one
virtual machine you could create any number of VMS OS instances, any
one of which could create any number of VMS virtual machines, repeat
until no resources.
a simpler expression of that concept: allow cgroups management of
devices and responsibilities to be hierarchical (just like a file
system).
sooo... you would saaay:
"i am PID1, i control all cgroups (by default). *but*, i delegate
responsibility for this this and this device node, and this this and
this functionality tooo.... PID1234"
in this way it would be possible to have PID1 delegate responsibility
for lxc-style (or other) device nodes etc. etc. out to lxc. it would
be possible to delegate the things that are needed for console
management out to a separate process, and to delegate the
responsibility for putting labels onto parents-plus-children out to
yet another.
i'd *recommend* that the delegation be permitted to be recursive
(i.e. that there be a permission "can-further-subdelegate").
the only other possible refinement to this would be to follow the
SE/Linux model instead of a "PID1 is god, can do anything, has
everything, and delegates parts out that fan out to progressively less
and less". SE/LInux has the wonderful (i.e. much better) security
model of "even God has limits", i.e. no process is *ever* given more
than it actually needs.
so, for example, using the SE/Linux style security model as
inspiration, PID1 would *NOT* be given the full set of cgroup
management rights: it would instead be allocated merely the *right to
delegate* management rights to other (specific) processes. i.e. if
PID1 actually tried to manage cgroup permissions itself, those
systemcalls would *rightly fail*.
l.
[1] depinit makes the console vt issue - recovering a system where x11
has crashed - a non-issue by using SysRq as the means to open another
console login. whilst i "Get" the whole idea of why effectively
creating "virtual machines" - similar to how lxc operates - to manage
and separate console sessions from virtual seats, i do see it as
complete overkill. if people *want* separate console sessions and
separate virtual seats, i feel that they should do so by deploying...
lxc!! "reinventing" what lxc has achieved, putting it directly into
say systemd, seems to be going a bit too far. especially as the only
real serious problem is x11 crashing and taking out the keyboard, but
depinit shows a way to deal with that. but i digress...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists