lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 03 Mar 2013 13:58:10 -0800
From: (Eric W. Biederman)
To:	Kees Cook <>
Cc:	"Serge E. Hallyn" <>,
	LKML <>,
	Serge Hallyn <>,
	Brad Spengler <>,
	Al Viro <>,
	PaX Team <>
Subject: Re: user ns: arbitrary module loading

Kees Cook <> writes:

> On Sat, Mar 2, 2013 at 8:12 PM, Eric W. Biederman <> wrote:
>> "Serge E. Hallyn" <> writes:
>>> Quoting Kees Cook (
>>>> The rearranging done for user ns has resulted in allowing arbitrary
>>>> kernel module loading[1] (i.e. re-introducing a form of CVE-2011-1019)
>>>> by what is assumed to be an unprivileged process.
>>>> At present, it does look to require at least CAP_SETUID along the way
>>>> to set up the uidmap (but things like the setuid helper newuidmap
>>>> might soon start providing such a thing by default).
>> CAP_SETUID is not needed.
> Do you have an example? I wasn't able to gain capabilities within the
> userns until I had a uid map set that allowed my uid to map to root.
> (I needed CAP_SYS_ADMIN in the mnt ns to get past may_mount() before I
> could touch get_fs_type().)

Likely because exec drops capbilities if your uid isn't 0.

Without CAP_SETUID you can still map uid to 0 to your current uid.

The following shell script is an easy basis for testing and playing
around with things.  It does assume a recent version of unshare from util-linux.

export IFIFO=/tmp/userns-test-$$-in
export OFIFO=/tmp/userns-test-$$-out
mkfifo $IFIFO
mkfifo $OFIFO
unshare --user -- /bin/bash -s <<'EOF' &
echo waiting-for-uid-and-gid-maps > $OFIFO
read LINE < $IFIFO
exec unshare --mount --net -- /bin/sh -s <<'EOF2'
mount --bind $HOME /root/
mount --bind /dev/null /dev/log
# Start a shell to keep the namespace reference alive
$SHELL -i < /dev/tty > /dev/tty 2> /dev/tty
read LINE < $OFIFO
uid=$(id --user)
gid=$(id --group)
echo "0 $uid 1" > /proc/$child/uid_map
echo "0 $gid 1" > /proc/$child/gid_map
echo uid-and-gid-maps > $IFIFO
wait $child

>> At a practical level I don't see adding a capalbe(CAP_SYS_ADMIN) check
>> as having much effect for the functionality currently present in user
>> namespaces today as the filesystems that an legal to mount in a user
>> namespace (ramfs, tmpfs, mqueuefs, sysfs, proc, devpts) are so common
>> most of them can not even be built as modules and even if they are
>> modules the modules will already be loaded.  So I will see about adding
>> a capable(CAP_SYS_ADMIN) check to shore things up for the short term.
> For the short-term, yes, this solves the "regression". But it
> obviously isn't what is desired for userns in the long term.

>> So I am wondering what I a good policy should be.  Should we trust
>> kernel modules to not be buggy (especially if they were signed as part
>> of the build process)?  Do we add some defense in depth and add
>> filesystem registration magic?  Thinking...
> If we can produce a mechanism that provides some defensive design, we
> should do it. There will always be bugs, so we should always try to
> make them harder to get to or harder to exploit. Reducing the
> available attack surface to "just filesystems" would be a win here,
> and it would be consistent with the _intent_ of existing kernel code.


>> We can limit the request_module in get_fs_type to just filesystems
>> fairly easily.
>> In include/linux/fs.h:
>>         #define MODULE_ALIAS_FS(type) MODULE_ALIAS("fs-" __stringify(type))
>> In fs/filesystems.c:
>>         if (request_moudle("fs-%.*s", len, name) == 0)
>> Then just add the appropriate MODULE_ALIAS_FS lines in all of the
>> filesystems.  This also allows user space to say set the module loading
>> policy for filesystems using the blacklist and the alias keywords
>> in /etc/modprobe.d/*.conf.
> This was the solution for netdev. The backward compat situation is this:
>         if (no_module && capable(CAP_NET_ADMIN))
>                 no_module = request_module("netdev-%s", name);
>         if (no_module && capable(CAP_SYS_MODULE)) {
>                 if (!request_module("%s", name))
>                         pr_warn("Loading kernel module for a network
> device with CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias
> netdev-%s instead.\n",
>                                 name);
>         }

If I have read it right the backwards compat case is there to be
compatibile with existing peoples configuration who had created specific
aliases like eth0 for specific network devices in
/etc/modprobe.d/*.conf.  Where the new netdev-* would break userspace.

I don't believe anyone does anything like that for filesystems, and even
if they tried it wouldn't work because the registered filesystem type
would have a different name.  So I don't see needing to worry about that
kind of backwards compatibility for filesystems.

> Those aren't ns_capable, though, so right now userns can't trigger
> loading new network drivers via ifconfig, but this seems like a
> reasonable approach to take.

Yes.  I saw those and deliberately didn't make those ns_capable, to be
on the safe side when I made the rest of the networking ns_capable.
That and ns_capabable is not meaningful as a check for loading modules.
The only interesting question is can the global root user trigger them
or can everyone trigger module loading.

>> That seems a whole lot simpler, more powerful and more maintainable than
>> what little I saw in GRKERNSEC_MODHARDEN to prevent loading of
>> non-filesystem modules from get_fs_type.
>> Eric
>> p.s.  This is the patch I am looking at pushing to Linus in the near
>> future.
>> diff --git a/fs/filesystems.c b/fs/filesystems.c
>> index da165f6..5b0644d 100644
>> --- a/fs/filesystems.c
>> +++ b/fs/filesystems.c
>> @@ -273,7 +273,8 @@ struct file_system_type *get_fs_type(const char *name)
>>         int len = dot ? dot - name : strlen(name);
>>         fs = __get_fs_type(name, len);
>> -       if (!fs && (request_module("%.*s", len, name) == 0))
>> +       if (!fs && capable(CAP_SYS_ADMIN) &&
>> +           (request_module("%.*s", len, name) == 0))
>>                 fs = __get_fs_type(name, len);
>>         if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) {
> Will this break other users of get_fs_type()? I think it's okay; it
> looks like all the other users are expected to be privileged.

I read through them all, and I didn't see problems.  The most concerning
one is tomoyo's use of get_fs_type before capabilities are checked in
do_mount.  But that is neither here nor there.

> Should this look like netdev in the future? Something like this, after
> making may_mount() available to fs/filesystems.c:
> -       if (!fs && (request_module("%.*s", len, name) == 0))
> +       if (!fs && ((may_mount() && request_module("fs-%.*s", len,
> name) == 0) ||
> +                   (capable(CAP_SYS_ADMIN) && request_module("%.*s",
> len, name) == 0)))
> and adding all the filesystem module aliases.
> At some point maybe we can add a flag to kill the old unqualified
> paths available to CAP_SYS_ADMIN for netdev and mount...

For netdev at least I don't see it as being particularly interesting.
The case for tunnels is already as unprivileged as you can reasonably
get with "request_module("rtnl-link-%s", kind);" in
net/core/rtnetlink.c:rtnl_newlink().  For real physical devices there is
both greater chance of a buggy module and no realy need as udev will
load the module based on hardware auto-discovery.

This leads to the fundamental question: Should we require privilege to
request the load filesystem modules?

I have looked at GRKERNSEC_MODHARDEN to see if that could give me some
guidance.  Unfortunately GRKERNSEC_MODHARDEN takes the position that fs
kernel modules are the only kernel modules that should ever auto-load
and only in very specific situations.  So I can't see that being the
normal kernel policy, especially since there are the sysctls
/proc/sys/kernel/modprobe and /proc/sys/kernel/modules_disabled.

Overall the basic policy building blocks for controlling which modules
are loaded seem solid and in use.  So I don't see any particular reason
why the kernel's default policy should not be to allow any users actions
to request modules.

So I think I am going to scrap the change sitting in my development tree
to require capalbe(CAP_SYS_ADMIN) to load a filesystem module and just
go with my request_module("fs-%.*",...); change. That is simple and
seems to match the rest of the kernel.

Does anyone see a reason why we should need CAP_SYS_ADMIN or be in the
initial user namespace to trigger a request of filesystem modules?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists