lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 30 Apr 2014 03:10:55 +0300 From: Marian Marinov <mm@...com> To: Stéphane Graber <stgraber@...ntu.com>, Andy Lutomirski <luto@...capital.net> CC: "Eric W. Biederman" <ebiederm@...ssion.com>, Linux Containers <containers@...ts.linux-foundation.org>, Serge Hallyn <serge.hallyn@...ntu.com>, Ted Ts'o <tytso@....edu>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, lxc-devel <lxc-devel@...ts.sourceforge.net> Subject: Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace On 04/30/2014 03:01 AM, Stéphane Graber wrote: > On Tue, Apr 29, 2014 at 04:51:54PM -0700, Andy Lutomirski wrote: >> On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@...ntu.com> wrote: >>> On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote: >>>> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@...com> wrote: >>>>> On 04/30/2014 01:45 AM, Andy Lutomirski wrote: >>>>>> >>>>>> On 04/29/2014 03:29 PM, Serge Hallyn wrote: >>>>>>> >>>>>>> Quoting Marian Marinov (mm-108MBtLGafw@...lic.gmane.org): >>>>>>>> >>>>>>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote: >>>>>>>>> >>>>>>>>> Quoting Marian Marinov (mm-108MBtLGafw@...lic.gmane.org): >>>>>>>>>> >>>>>>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote: >>>>>>>>>>> >>>>>>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@...lic.gmane.org): >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm proposing a fix to this, by replacing the >>>>>>>>>>>>> capable(CAP_LINUX_IMMUTABLE) >>>>>>>>>>>>> check with ns_capable(current_cred()->user_ns, >>>>>>>>>>>>> CAP_LINUX_IMMUTABLE). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Um, wouldn't it be better to simply fix the capable() function? >>>>>>>>>>>> >>>>>>>>>>>> /** >>>>>>>>>>>> * capable - Determine if the current task has a superior >>>>>>>>>>>> capability in effect >>>>>>>>>>>> * @cap: The capability to be tested for >>>>>>>>>>>> * >>>>>>>>>>>> * Return true if the current task has the given superior >>>>>>>>>>>> capability currently >>>>>>>>>>>> * available for use, false if not. >>>>>>>>>>>> * >>>>>>>>>>>> * This sets PF_SUPERPRIV on the task if the capability is >>>>>>>>>>>> available on the >>>>>>>>>>>> * assumption that it's about to be used. >>>>>>>>>>>> */ >>>>>>>>>>>> bool capable(int cap) >>>>>>>>>>>> { >>>>>>>>>>>> return ns_capable(&init_user_ns, cap); >>>>>>>>>>>> } >>>>>>>>>>>> EXPORT_SYMBOL(capable); >>>>>>>>>>>> >>>>>>>>>>>> The documentation states that it is for "the current task", and I >>>>>>>>>>>> can't imagine any use case, where user namespaces are in effect, >>>>>>>>>>>> where >>>>>>>>>>>> using init_user_ns would ever make sense. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> the init_user_ns represents the user_ns owning the object, not the >>>>>>>>>>> subject. >>>>>>>>>>> >>>>>>>>>>> The patch by Marian is wrong. Anyone can do 'clone(CLONE_NEWUSER)', >>>>>>>>>>> setuid(0), execve, and end up satisfying >>>>>>>>>>> 'ns_capable(current_cred()->userns, >>>>>>>>>>> CAP_SYS_IMMUTABLE)' by definition. >>>>>>>>>>> >>>>>>>>>>> So NACK to that particular patch. I'm not sure, but IIUC it should >>>>>>>>>>> be >>>>>>>>>>> safe to check against the userns owning the inode? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So what you are proposing is to replace >>>>>>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with >>>>>>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ? >>>>>>>>>> >>>>>>>>>> I agree that this is more sane. >>>>>>>>> >>>>>>>>> >>>>>>>>> Right, and I think the two operations you're looking at seem sane >>>>>>>>> to allow. >>>>>>>> >>>>>>>> >>>>>>>> If you are ok with this patch, I will fix all file systems and send >>>>>>>> patches. >>>>>>> >>>>>>> >>>>>>> Sounds good, thanks. >>>>>>> >>>>>>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@...lic.gmane.org> >>>>>>> >>>>>>> >>>>>>> Acked-by: Serge E. Hallyn >>>>>>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@...lic.gmane.org> >>>>>> >>>>>> >>>>>> Wait, what? >>>>>> >>>>>> Inodes aren't owned by user namespaces; they're owned by users. And any >>>>>> user can arrange to have a user namespace in which they pass an >>>>>> inode_capable check on any inode that they own. >>>>>> >>>>>> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed. If this >>>>>> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE >>>>>> entirely. >>>>> >>>>> >>>>> The problem I'm trying to solve is this: >>>>> >>>>> container with its own user namespace and CAP_SYS_IMMUTABLE should be able >>>>> to use chattr on all files witch this container has access to. >>>>> >>>>> Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working. >>>>> >>>>> With the proposed two fixes CAP_SYS_IMMUTABLE started working in the >>>>> container. >>>>> >>>>> The first solution got its user namespace from the currently running process >>>>> and the second gets its user namespace from the currently opened inode. >>>>> >>>>> So what would be the best solution in this case? >>>> >>>> I'd suggest adding a mount option like fs_owner_uid that names a uid >>>> that owns, in the sense of having unlimited access to, a filesystem. >>>> Then anyone with caps on a namespace owned by that uid could do >>>> whatever. >>>> >>>> Eric? >>>> >>>> --Andy >>> >>> The most obvious problem I can think of with "do whatever" is that this >>> will likely include mknod of char and block devices which you can then >>> chown/chmod as you wish and use to access any devices on the system from >>> an unprivileged container. >>> This can however be mitigated by using the devices cgroup controller. >> >> Or 'nodev'. setuid/setgid may have the same problem, too. >> >> Implementing something like this would also make CAP_DAC_READ_SEARCH >> and CAP_DAC_OVERRIDE work. >> >> Arguably it should be impossible to mount such a thing in the first >> place without global privilege. >> >>> >>> You also probably wouldn't want any unprivileged user from the host to >>> find a way to access that mounted filesytem but so long as you do the >>> mount in a separate mountns and don't share uids between the host and >>> the container, that should be fine too. >> >> This part should be a nonissue -- an unprivileged user who has the >> right uid owns the namespace anyway, so this is the least of your >> worries. >> >> --Andy > > It should be a nonissue so long as we make sure that a file owned by a > uid outside the scope of the container may not be changed even though > fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say > a shell and anyone who can see the fs from the host will be getting a > root shell (assuming said file is owned by the host's uid 0). > > So that's restricting slightly what "do whatever" would do in this case. > In my case I give an LVM volume to each container and limit the container to only this block device using the devices cgroup. So the inode_capable() fix worked like a charm for me. The container can not see any filesystem other then its own. And I have another patch for my kernel that prohibits setns from cgroup other then / which prevents programs from the container from getting out. clone() can be used to create new namespaces but can not be used to attach to already created namespaces. Marian -- Marian Marinov Founder & CEO of 1H Ltd. Jabber/GTalk: hackman@...ber.org ICQ: 7556201 Mobile: +359 886 660 270 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists