linux-kernel - Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <57280456.1090106@suse.de>
Date:	Tue, 3 May 2016 11:52:22 +1000
From:	Aleksa Sarai <asarai@...e.de>
To:	Tejun Heo <tj@...nel.org>
Cc:	Li Zefan <lizefan@...wei.com>,
	Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org, dev@...ncontainers.org,
	Aleksa Sarai <cyphar@...har.com>,
	James Bottomley <James.Bottomley@...senPartnership.com>
Subject: Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup
 namespaces

>> Change the mode of the cgroup directory for each cgroup association,
>> allowing the process to create subtrees and modify the limits of the
>> subtrees *without* allowing the process to modify its own limits. Due to
>> the cgroup core restrictions and unix permission model, this allows for
>> processes to create new subtrees without breaking the cgroup limits for
>> the process.
>
> I don't get why this is necessary.  What's wrong with the parent
> setting up permission correctly for the namespace?

The parent setting this up requires either:

1. A privileged process giving the process write access to the cgroup 
directory it is currently in. Since no software does this by default, 
and in addition it might not always make sense (systemd doesn't like 
processes messing around in their respective cgroups), this has to be 
dealt with better.

2. The process itself is a privileged process, which is not the usecase 
I'm going for with rootless containers. If you have root, you can do 
whatever you want in this regard and this feature doesn't affect you.

The main reason for this patchset is because I would like to make sure 
that unprivileged processes can take advantage of cgroup features (such 
as the freezer cgroup, and to just do regular resource limiting). Since 
cgroups are a hierarchy, I can see no fundamental reason why this is not 
possible. And the cgroup namespace appears to be the perfect way of 
doing it. I firmly believe there is a simple and safe way of allowing 
unprivileged processes to create subtrees of their current cgroup.

However, I agree with James that this patchset isn't ideal (it was my 
first rough attempt). I think I'll get to work on properly virtualising 
/sys/fs/cgroup, which will allow for a new cgroup namespace to modify 
subtrees (but without allowing for cgroup escape) -- by pinning what pid 
namespace the cgroup was created under. We can use the same type of 
virtualization that /proc does (except instead of selectively showing 
the dentries, we selectively show different owners of the dentries).

Would that be acceptable?

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/