linux-kernel - Re: [PATCH -next 2/2] cgroup: Disallow delegatee to write all interfaces outsize of cgroup ns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17f6eb65-e6c0-47be-b88a-e33819bbd66f@huawei.com>
Date: Tue, 13 Aug 2024 11:48:44 +0800
From: chenridong <chenridong@...wei.com>
To: Michal Koutný <mkoutny@...e.com>
CC: <tj@...nel.org>, <lizefan.x@...edance.com>, <hannes@...xchg.org>,
	<longman@...hat.com>, <cgroups@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -next 2/2] cgroup: Disallow delegatee to write all
 interfaces outsize of cgroup ns



On 2024/8/13 0:57, Michal Koutný wrote:
> Hello.
> 
> On Mon, Aug 12, 2024 at 07:37:46AM GMT, Chen Ridong <chenridong@...wei.com> wrote:
>> cd /sys/fs/cgroup
>> echo '+pids' > cgroup.subtree_control
>> mkdir dlgt_grp_ns
>> echo '+pids' > dlgt_grp_ns/cgroup.subtree_control
>> mkdir dlgt_grp_ns/dlgt_grp_ns1
>> echo $$ > dlgt_grp_ns/dlgt_grp_ns1/cgroup.procs
>> echo 200 > dlgt_grp_ns/dlgt_grp_ns1/pids.max
>> unshare -Cm /bin/bash
>> echo max > /dlgt_grp_ns1/pids.max // Permission denied
>> echo -pids > dlgt_grp_ns/cgroup.subtree_control // pids was unlimited now
> 
> You could also have increased the ancestral limit (if there was any)
> echo max > dlgt_grp_ns/pids.max // similarly allowed
> 
> If you're a root (or otherwise have sufficient permissions) and you can
> _see_ an ancestral cgroup, you can write to its attributes according to
> permissions. Thus the delegation works via cgroup ns (in)visibility but
> cgroup ns root is visible on both sides of the boundary hence the extra
> check.
> 
> I can imagine that a container runtime process could enter the target
> cgroup ns while keeping visibility to the original cgroup ns and do some
> tuning on it before it drops any pointers to the original cgroup ns and
> exec's delegatee's workload. (But it's only my imagination to illustrate
> that this may be a breaking change.)
> 
Yes, this is a change. However, the process within the container can 
turn on subsystems , they can also turn off subsystems , which may not 
only affect the host machine but may also affect other containers. I 
believe this poses a danger.

> OTOH, I can see why this would be consistent with the migration rules
> that exist between sides of cgroup ns, so this could work if it was
> hidden behind (another) mount option like 'nsdelegate2' :-p
> 

Hello, Michal, I am sorry I can not get your point.
Did you mean this issue should be fixed with another mount option 
'nsdelegate2'? Or is the current patch acceptable?

> 
>> @@ -4134,8 +4134,10 @@ static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
>>   	 * cgroup.procs, cgroup.threads and cgroup.subtree_control.
>>   	 */
>>   	if ((cgrp->root->flags & CGRP_ROOT_NS_DELEGATE) &&
>> -	    !(cft->flags & CFTYPE_NS_DELEGATABLE) &&
>> -	    ctx->ns != &init_cgroup_ns && ctx->ns->root_cset->dfl_cgrp == cgrp)
>> +		ctx->ns != &init_cgroup_ns &&
>> +		(!cgroup_is_descendant(cgrp, ctx->ns->root_cset->dfl_cgrp) ||
>> +			(!(cft->flags & CFTYPE_NS_DELEGATABLE) &&
>> +			ctx->ns->root_cset->dfl_cgrp == cgrp)))
>>   		return -EPERM;
> 
> Could you please also update the comment above, to describe the boundary
> vs subtree delegation?
> Thanks,
> Michal

Sure.

Thanks
Ridong