linux-kernel - Re: [PATCH] fs: don't allow non-init s_user_ns for filesystems without FS_USERNS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b02d93c9cd1ccda04127031155ec9b4c29ee69d5.camel@kernel.org>
Date: Thu, 29 Jan 2026 09:36:54 -0500
From: Jeff Layton <jlayton@...nel.org>
To: "Seth Forshee (DigitalOcean)" <sforshee@...nel.org>, Alexander Viro	
 <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara
	 <jack@...e.cz>
Cc: Amir Goldstein <amir73il@...il.com>, Aleksa Sarai <cyphar@...har.com>, 
 Alexander Mikhalitsyn <alexander@...alicyn.com>,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-nfs@...r.kernel.org
Subject: Re: [PATCH] fs: don't allow non-init s_user_ns for filesystems
 without FS_USERNS_MOUNT

On Wed, 2024-07-24 at 09:53 -0500, Seth Forshee (DigitalOcean) wrote:
> Christian noticed that it is possible for a privileged user to mount
> most filesystems with a non-initial user namespace in sb->s_user_ns.
> When fsopen() is called in a non-init namespace the caller's namespace
> is recorded in fs_context->user_ns. If the returned file descriptor is
> then passed to a process priviliged in init_user_ns, that process can
> call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
> with sb->s_user_ns set to the namespace of the process which called
> fsopen().
> 
> This is problematic. We cannot assume that any filesystem which does not
> set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in
> mind, increasing the risk for bugs and security issues.
> 
> Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is
> not set for the filesystem and a non-initial user namespace will be
> used. sget() does not need to be updated as it always uses the user
> namespace of the current context, or the initial user namespace if
> SB_SUBMOUNT is set.
> 
> Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()")
> Reported-by: Christian Brauner <brauner@...nel.org>
> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@...nel.org>
> ---
>  fs/super.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 095ba793e10c..d681fb7698d8 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -736,6 +736,17 @@ struct super_block *sget_fc(struct fs_context *fc,
>  	struct user_namespace *user_ns = fc->global ? &init_user_ns : fc->user_ns;
>  	int err;
>  
> +	/*
> +	 * Never allow s_user_ns != &init_user_ns when FS_USERNS_MOUNT is
> +	 * not set, as the filesystem is likely unprepared to handle it.
> +	 * This can happen when fsconfig() is called from init_user_ns with
> +	 * an fs_fd opened in another user namespace.
> +	 */
> +	if (user_ns != &init_user_ns && !(fc->fs_type->fs_flags & FS_USERNS_MOUNT)) {
> +		errorfc(fc, "mounting from non-initial user namespace is not allowed");
> +		return ERR_PTR(-EPERM);
> +	}
> +
>  retry:
>  	spin_lock(&sb_lock);
>  	if (test) {
> 
> ---
> base-commit: 256abd8e550ce977b728be79a74e1729438b4948
> change-id: 20240723-s_user_ns-fix-b00c31de1cb8
> 
> Best regards,

I sent an incorrect RFC patch for this yesterday, but this patch breaks
NFS mounting in containers for us, as the prohibited activity is
exactly the process we use to do them.

We basically have a task in the container do an fsopen() and then pass
the fd to a daemon in the init namespace via unix socket. The daemon
vets the NFS mount parameters (ensuring that the mount options are
sane, and that we trust the server), and then does the mount inside the
container.

We don't want to set FS_USERNS_MOUNT on NFS, because that would give
the container carte blanche to mount anything it likes, even a
malicious server. Do we need to split that flag into two? Maybe
FS_USERNS_SAFE and FS_USERNS_MOUNT?
--
Jeff Layton <jlayton@...nel.org>