lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Oct 2014 17:20:44 -0700
From:	Andy Lutomirski <>
To:	"Eric W.Biederman" <>
Cc:	"Serge E. Hallyn" <>,
	Aditya Kali <>,
	Linux API <>,
	Linux Containers <>,
	Serge Hallyn <>,
	"" <>,
	Tejun Heo <>,,
	Ingo Molnar <>
Subject: Re: [PATCHv1 7/8] cgroup: cgroup namespace setns support

On Sun, Oct 19, 2014 at 9:55 PM, Eric W.Biederman <> wrote:
> On October 19, 2014 1:26:29 PM CDT, Andy Lutomirski <> wrote:
>>On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman
>><> wrote:
>>> "Serge E. Hallyn" <> writes:
>>>> Quoting Aditya Kali (
>>>>> On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn <>
>>>>> > Quoting Aditya Kali (
>>>>> >> setns on a cgroup namespace is allowed only if
>>>>> >> * task has CAP_SYS_ADMIN in its current user-namespace and
>>>>> >>   over the user-namespace associated with target cgroupns.
>>>>> >> * task's current cgroup is descendent of the target
>>>>> >>   cgroup.
>>>>> >
>>>>> > What is the point of this?
>>>>> >
>>>>> > If I'm a user logged into
>>>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start
>>>>> > a container which is in
>>>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1
>>>>> > then I will want to be able to enter the container's cgroup.
>>>>> > The container's cgroup root is under my own (satisfying the
>>>>> > below condition0 but my cgroup is not a descendent of the
>>>>> > container's cgroup.
>>>>> >
>>>>> This condition is there because we don't want to do implicit cgroup
>>>>> changes when a process attaches to another cgroupns. cgroupns tries
>>>>> preserve the invariant that at any point, your current cgroup is
>>>>> always under the cgroupns-root of your cgroup namespace. But in
>>>>> example, if we allow a process in "session-c12.scope" container to
>>>>> attach to cgroupns root'ed at "session-c12.scope/x1" container
>>>>> (without implicitly moving its cgroup), then this invariant won't
>>>>> hold.
>>>> Oh, I see.  Guess that should be workable.  Thanks.
>>> Which has me looking at what the rules are for moving through
>>> the cgroup hierarchy.
>>> As long as we have write access to cgroup.procs and are allowed
>>> to open the file for write, we can move any of our own tasks
>>> into the cgroup.  So the cgroup namespace rules don't seem
>>> to be a problem.
>>> Andy can you please take a look at the permission checks in
>>> __cgroup_procs_write.
>>The actual requirements for calling that function haven't changed,
>>right?  IOW, what does this have to do with cgroupns?
> Excluding user namespaces the requirements have not changed.
> The immediate correlation is that to enter a cgroupns you must first put your process in one of it's cgroups.
> So I was examining what it would take to enter the cgroup of cgroupns.
>> Is the idea
>>that you want a privileged user wrt a cgroupns's userns to be able to
>>use this?  If so:
>>Yes, that current_cred() thing is bogus.  (Actually, this is probably
>>exploitable right now if any cgroup.procs inode anywhere on the system
>>lets non-root write.)  (Can we have some kernel debugging option that
>>makes any use of current_cred() in write(2) warn?)
>>We really need a weaker version of may_ptrace for this kind of stuff.
>>Maybe the existing may_ptrace stuff is okay, actually.  But this is
>>completely missing group checks, cap checks, capabilities wrt the
>>userns, etc.
>>Also, I think that, if this version of the patchset allows non-init
>>userns to unshare cgroupns, then the issue of what permission is
>>needed to lock the cgroup hierarchy like that needs to be addressed,
>>because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin
>>the calling task with no permission required.  Bolting on a fix later
>>will be a mess.
> I imagine the pinning would be like the userns.
> Ah but there is a potentially serious issue with the pinning.
> With pinning we can make it impossible for root to move us to a different cgroup.
> I am not certain how serious that is but it bears thinking about.
> If we don't implement pinning we should be able to implent everything with just filesystem mount options, and no new namespace required.
> Sigh.
> I am too tired tonight to see the end game in this.

Possible solution:

Ditch the pinning.  That is, if you're outside a cgroupns (or you have
a non-ns-confined cgroupfs mounted), then you can move a task in a
cgroupns outside of its root cgroup.  If you do this, then the task
thinks its cgroup is something like "../foo" or "../../foo".

While we're at it, consider making setns for a cgroupns *not* change
the caller's cgroup.  Is there any reason it really needs to?


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists