linux-kernel - Re: [PATCH] capabilities: new kernel.ns_modules

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202208091551.6879CADA0@keescook>
Date:   Tue, 9 Aug 2022 15:56:05 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Vegard Nossum <vegard.nossum@...cle.com>
Cc:     linux-kernel@...r.kernel.org,
        Thadeu Lima de Souza Cascardo <cascardo@...onical.com>,
        Serge Hallyn <serge@...lyn.com>,
        Eric Biederman <ebiederm@...ssion.com>,
        linux-hardening@...r.kernel.org, John Haxby <john.haxby@...cle.com>
Subject: Re: [PATCH] capabilities: new kernel.ns_modules_allowed sysctl

On Tue, Aug 09, 2022 at 08:52:29PM +0200, Vegard Nossum wrote:
> Creating a new user namespace grants you the ability to reach a lot of code
> (including loading certain kernel modules) that would otherwise be out of
> reach of an attacker. We can reduce the attack surface and block exploits
> by ensuring that user namespaces cannot trigger module (auto-)loading.
> 
> A cursory search of exploits found online yields the following extremely
> non-exhaustive list of vulnerabilities, and shows that the technique is
> both old and still in use:
> 
> - CVE-2016-8655
> - CVE-2017-1000112
> - CVE-2021-32606
> - CVE-2022-2588
> - CVE-2022-27666
> - CVE-2022-34918
> 
> This patch adds a new sysctl, kernel.ns_modules_allowed, which when set to
> 0 will block requests to load modules when the request originates in a
> process running in a user namespace.
> 
> For backwards compatibility, the default value of the sysctl is set to
> CONFIG_NS_MODULES_ALLOWED_DEFAULT_ON, which in turn defaults to 1, meaning
> there should be absolutely no change in behaviour unless you opt in either
> at compile time or at runtime.
> 
> This mitigation obviously offers no protection if the vulnerable module is
> already loaded, but for many of these exploits the vast majority of users
> will never actually load or use these modules on purpose; in other words,
> for the vast majority of users, this would block exploits for the above
> list of vulnerabilities.

We've needed better module autoloading protections for a long time[1].
This patch is a big hammer ("all user namespaces"), so I worry it
wouldn't actually get used much.

Here's a pointer into a prior thread, where Linus chimed in[2].
I replied back then, but I'm not sure I agree with my 2017 self any
more. :P

It really does feel like the loading decisions need to be made by the
userspace helper, which currently doesn't have enough information to
make those choices.

-Kees

[1] https://github.com/KSPP/linux/issues/24
[2] https://lore.kernel.org/kernel-hardening/CA+55aFxiDKfe6VCM+aV2OgnkzMpP+iz+rn2k25_Qa_QLex=pPQ@mail.gmail.com/

> 
> Testing: Running the reproducer for CVE-2022-2588 fails and results in the
> following message in the kernel log:
> 
>     [  130.208030] request_module: pid 4107 (a.out) requested kernel module rtnl-link-dummy; denied due to kernel.ns_modules_allowed sysctl
> 
> Cc: Thadeu Lima de Souza Cascardo <cascardo@...onical.com>
> Cc: Serge Hallyn <serge@...lyn.com>
> Cc: Eric Biederman <ebiederm@...ssion.com>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: linux-hardening@...r.kernel.org
> Cc: John Haxby <john.haxby@...cle.com>
> Signed-off-by: Vegard Nossum <vegard.nossum@...cle.com>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst | 11 ++++++
>  init/Kconfig                                | 17 +++++++++
>  kernel/kmod.c                               | 39 +++++++++++++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index ddccd10774623..551de7bce836c 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -592,6 +592,17 @@ to the guest kernel command line (see
>  Documentation/admin-guide/kernel-parameters.rst).
>  
>  
> +ns_modules_allowed
> +==================
> +
> +Control whether processes may trigger module loading inside a user namespace.
> +
> += =================================
> +0 Deny module loading requests.
> +1 Accept module loading requests.
> += =================================
> +
> +
>  numa_balancing
>  ==============
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index c984afc489dea..6734373995936 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1226,6 +1226,23 @@ config USER_NS
>  
>  	  If unsure, say N.
>  
> +config NS_MODULES_ALLOWED_DEFAULT_ON
> +	bool "Allow user namespaces to auto-load kernel modules by default"
> +	depends on MODULES
> +	depends on USER_NS
> +	default y
> +	help
> +	  This option makes it so that processes running inside user
> +	  namespaces may auto-load kernel modules.
> +
> +	  Say N to mitigate some exploits that rely on being able to
> +	  auto-load kernel modules; however, this may also cause some
> +	  legitimate programs to fail unless kernel modules are loaded by
> +	  hand.
> +
> +	  You can write 0 or 1 to /proc/sys/kernel/ns_modules_allowed to
> +	  change behaviour at run-time.
> +
>  config PID_NS
>  	bool "PID Namespaces"
>  	default y
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index b717134ebe170..53e26009410ef 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -105,6 +105,12 @@ static int call_modprobe(char *module_name, int wait)
>  	return -ENOMEM;
>  }
>  
> +/*
> + * Allow processes running inside namespaces to trigger module loading?
> + */
> +static bool sysctl_ns_modules_allowed __read_mostly =
> +	IS_BUILTIN(CONFIG_NS_MODULES_ALLOWED_DEFAULT_ON);
> +
>  /**
>   * __request_module - try to load a kernel module
>   * @wait: wait (or not) for the operation to complete
> @@ -148,6 +154,21 @@ int __request_module(bool wait, const char *fmt, ...)
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * Disallow if we're in a user namespace and we don't have
> +	 * CAP_SYS_MODULE in the init namespace.
> +	 */
> +	if (current_user_ns() != &init_user_ns && !capable(CAP_SYS_MODULE)) {
> +		if (sysctl_ns_modules_allowed) {
> +			pr_warn_ratelimited("request_module: pid %d (%s) in user namespace requested kernel module %s\n",
> +				task_pid_nr(current), current->comm, module_name);
> +		} else {
> +			pr_warn_ratelimited("request_module: pid %d (%s) in user namespace requested kernel module %s; denied due to kernel.ns_modules_allowed sysctl\n",
> +				task_pid_nr(current), current->comm, module_name);
> +			return -EPERM;
> +		}
> +	}
> +
>  	if (atomic_dec_if_positive(&kmod_concurrent_max) < 0) {
>  		pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
>  				    atomic_read(&kmod_concurrent_max),
> @@ -175,3 +196,21 @@ int __request_module(bool wait, const char *fmt, ...)
>  	return ret;
>  }
>  EXPORT_SYMBOL(__request_module);
> +
> +static struct ctl_table kmod_sysctl_table[] = {
> +	{
> +		.procname       = "ns_modules_allowed",
> +		.data           = &sysctl_ns_modules_allowed,
> +		.maxlen         = sizeof(sysctl_ns_modules_allowed),
> +		.mode           = 0644,
> +		.proc_handler   = proc_dobool,
> +	},
> +	{ }
> +};
> +
> +static int __init kmod_sysctl_init(void)
> +{
> +	register_sysctl_init("kernel", kmod_sysctl_table);
> +	return 0;
> +}
> +late_initcall(kmod_sysctl_init);
> -- 
> 2.35.1.46.g38062e73e0
> 

-- 
Kees Cook