linux-kernel - RE: Could not mount sysfs when enable userns but disable netns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5871495633F38949900D2BF2DC04883E5632BD@G08CNEXMBPEKD02.g08.fujitsu.local>
Date:	Mon, 14 Jul 2014 09:32:39 +0000
From:	"chenhanxiao@...fujitsu.com" <chenhanxiao@...fujitsu.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	"Serge E. Hallyn" <serge@...lyn.com>,
	"'Daniel P. Berrange (berrange@...hat.com)'" <berrange@...hat.com>
CC:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"containers@...ts.linux-foundation.org" 
	<containers@...ts.linux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: Could not mount sysfs when enable userns but disable netns



> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm@...ssion.com]
> Sent: Saturday, July 12, 2014 12:29 AM
> To: Serge E. Hallyn
> Cc: Chen, Hanxiao/陈 晗霄; Serge Hallyn (serge.hallyn@...ntu.com); Greg
> Kroah-Hartman; containers@...ts.linux-foundation.org;
> linux-kernel@...r.kernel.org
> Subject: Re: Could not mount sysfs when enable userns but disable netns
> 
> "Serge E. Hallyn" <serge@...lyn.com> writes:
> 
> > Quoting chenhanxiao@...fujitsu.com (chenhanxiao@...fujitsu.com):
> >> Hello,
> >>
> >> How to reproduce:
> >> 1. Prepare a container, enable userns and disable netns
> >> 2. use libvirt-lxc to start a container
> >> 3. libvirt could not mount sysfs then failed to start.
> >>
> >> Then I found that
> >> commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says:
> >> "Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
> >> over the net namespace."
> >>
> >> But why should we check sysfs mouont permission over net namespace?
> >> We've already checked CAP_SYS_ADMIN though.
> 
> We already checked capable(CAP_SYS_ADMIN) and it failed.

But on my machine, capable(CAP_SYS_ADMIN) passed
but failed in kobj_ns_current_may_mount.

I added some printks in sysfs_mount:
        if (!(flags & MS_KERNMOUNT)) {
-               if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type))
+               if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) {
+                       printk(KERN_WARNING "Failed in capable\n");
                        return ERR_PTR(-EPERM);
+                }
 
-               if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
+               if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) {
+                       printk(KERN_WARNING "Failed in kobj_ns_current_may_mount\n");
                        return ERR_PTR(-EPERM);
+                }

And found: 
Jul 14 09:55:26 localhost systemd: Starting Container lxc-chx.
Jul 14 09:55:26 localhost systemd-machined: New machine lxc-chx.
Jul 14 09:55:26 localhost systemd: Started Container lxc-chx.
Jul 14 09:55:26 localhost kernel: [  784.044709] Failed in kobj_ns_current_may_mount
Jul 14 09:55:26 localhost systemd-machined: Machine lxc-chx terminated.

> 
> >> What the relationship between sysfs and net namespace,
> >> or this check is a little redundant?
> 
> You want a bind mount not a new fresh mount.
> 

Yes, we need to modify libvirt's codes to deal with sysfs
when enable userns but disable netns.

Thanks,
- Chen

> When looking at how evil actors could abuse things it turned out that in
> some circumstances the root user (before a user namespace is created)
> needs to control the policy on which filesystems may be mounted.  There
> are files in sysfs and in proc that you never want to see in a chroot
> jail, as they just create more surface area to attack.
> 
> The only reason for creating a new fresh mount of sysfs is to get access
> to /sys/class/net.  So to keep things simple we restrict creation of
> that mount to cases where the mounter has permisions over the network
> namespace, and cases where nothing interesing is mounted on top of
> sysfs.
> 
> If a new /sys/class/net is not needed it is possible to bind mount the
> existing copy of sysfs to the new location without loss of
> functionality.
> 
> > It is not redundant.  The whole point is that after clone(CLONE_NEWUSER)
> > you get a newly filled set of capabilities.  But you should not have
> > privileges over the host's network namesapce.  After you unshare a new
> > network namespace, you *should* have privilege over it.  So the fact
> > that we've already check CAP_SYS_ADMIN means nothing, because the
> > capabilities need to be targeted.
> 
> Exactly the tests are failing because the caller is not the global root
> and so the code is properly failing the permission checks.
> 
> Eric