linux-kernel - Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <573098D5.3070109@suse.de>
Date:	Tue, 10 May 2016 00:04:05 +1000
From:	Aleksa Sarai <asarai@...e.de>
To:	Tejun Heo <tj@...nel.org>
Cc:	Li Zefan <lizefan@...wei.com>,
	Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org, dev@...ncontainers.org,
	Aleksa Sarai <cyphar@...har.com>,
	James Bottomley <James.Bottomley@...senPartnership.com>
Subject: Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup
 namespaces

>>> However, I agree with James that this patchset isn't ideal (it was my
>>> first
>>> rough attempt). I think I'll get to work on properly virtualising
>>> /sys/fs/cgroup, which will allow for a new cgroup namespace to modify
>>> subtrees (but without allowing for cgroup escape) -- by pinning what pid
>>> namespace the cgroup was created under. We can use the same type of
>>> virtualization that /proc does (except instead of selectively showing
>>> the
>>> dentries, we selectively show different owners of the dentries).
>>>
>>> Would that be acceptable?
>>
>> I'm still not sold on the idea.  For better or worse, the permission
>> model is mostly based on vfs and I don't want to deviate too much as
>> that's likely to become confusing pretty quickly.  If a sub-hierarchy
>> is to be delegated, that's upto whomever is controlling cgroup
>> hierarchy in the sub-domain.  We can expand the perm checks to
>> consider user namespaces but I'd like to avoid going beyond that.
>
> As I mentioned in the other thread, I had another idea for a way to do
> this (that was more complicated to implement, so I went with this
> simpler patch first):
>
> On unshare(), we create a new cgroup that is a child of the calling
> process's current cgroup association (in all of the hierarchies,
> obviously). The new cgroup directory (and contained files) are owned by
> current_fs_{u,g}id(). The process is then moved into the cgroup, and the
> root of the cgroup namespace is changed to be that cgroup. This way,
> there would be no disparity between the VFS and cgroup permission model
> -- there'll be a global view of the cgroup hierarchy that everyone
> agrees on.
>
> I had three concerns with this patch:
>
> 1. It would cause issues with the no internal process constraint of
> cgroupv2. I spent some time trying to figure out how cgroupv2 would act
> in this case (do all of the processes automatically get moved into new
> subdirectories?), but couldn't figure it out. If it does move all of the
> processes into the subdirectory, we'd have to make a sink cgroup as well
> as the one for the namespace -- which then just becomes inefficient (you
> have a cgroup that has no purpose from an administration perspective).
>
> 2. We'd have to come up with a way to make the name of the new cgroup
> resistent to clashes (especially with cgroups already created by other
> processes), which smacks of a suboptimal solution to the problem.
>
> 3. We'd be creating cgroups and attaching processes to the cgroups
> without explicitly going through the VFS layer. This presumably means
> that other parts of userspace might not get alerted properly to the
> changes. I'm not really sure how we should deal with that, but it sounds
> like it could cause problems for someone.

Does anyone have any opinions on this idea?

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/