linux-kernel - Re: [PATCH] fs: don't allow non-init s_user_ns for filesystems without FS_USERNS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20260129-zielgebiet-zutiefst-d9d9cb902f1b@brauner>
Date: Thu, 29 Jan 2026 16:49:12 +0100
From: Christian Brauner <brauner@...nel.org>
To: Jeff Layton <jlayton@...nel.org>
Cc: "Seth Forshee (DigitalOcean)" <sforshee@...nel.org>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>, Amir Goldstein <amir73il@...il.com>, 
	Aleksa Sarai <cyphar@...har.com>, Alexander Mikhalitsyn <alexander@...alicyn.com>, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org
Subject: Re: [PATCH] fs: don't allow non-init s_user_ns for filesystems
 without FS_USERNS_MOUNT

On Thu, Jan 29, 2026 at 09:36:54AM -0500, Jeff Layton wrote:
> On Wed, 2024-07-24 at 09:53 -0500, Seth Forshee (DigitalOcean) wrote:
> > Christian noticed that it is possible for a privileged user to mount
> > most filesystems with a non-initial user namespace in sb->s_user_ns.
> > When fsopen() is called in a non-init namespace the caller's namespace
> > is recorded in fs_context->user_ns. If the returned file descriptor is
> > then passed to a process priviliged in init_user_ns, that process can
> > call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
> > with sb->s_user_ns set to the namespace of the process which called
> > fsopen().
> > 
> > This is problematic. We cannot assume that any filesystem which does not
> > set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in
> > mind, increasing the risk for bugs and security issues.
> > 
> > Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is
> > not set for the filesystem and a non-initial user namespace will be
> > used. sget() does not need to be updated as it always uses the user
> > namespace of the current context, or the initial user namespace if
> > SB_SUBMOUNT is set.
> > 
> > Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()")
> > Reported-by: Christian Brauner <brauner@...nel.org>
> > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@...nel.org>
> > ---
> >  fs/super.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/fs/super.c b/fs/super.c
> > index 095ba793e10c..d681fb7698d8 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -736,6 +736,17 @@ struct super_block *sget_fc(struct fs_context *fc,
> >  	struct user_namespace *user_ns = fc->global ? &init_user_ns : fc->user_ns;
> >  	int err;
> >  
> > +	/*
> > +	 * Never allow s_user_ns != &init_user_ns when FS_USERNS_MOUNT is
> > +	 * not set, as the filesystem is likely unprepared to handle it.
> > +	 * This can happen when fsconfig() is called from init_user_ns with
> > +	 * an fs_fd opened in another user namespace.
> > +	 */
> > +	if (user_ns != &init_user_ns && !(fc->fs_type->fs_flags & FS_USERNS_MOUNT)) {
> > +		errorfc(fc, "mounting from non-initial user namespace is not allowed");
> > +		return ERR_PTR(-EPERM);
> > +	}
> > +
> >  retry:
> >  	spin_lock(&sb_lock);
> >  	if (test) {
> > 
> > ---
> > base-commit: 256abd8e550ce977b728be79a74e1729438b4948
> > change-id: 20240723-s_user_ns-fix-b00c31de1cb8
> > 
> > Best regards,
> 
> I sent an incorrect RFC patch for this yesterday, but this patch breaks

Oh? I did not see it.

> NFS mounting in containers for us, as the prohibited activity is
> exactly the process we use to do them.
> 
> We basically have a task in the container do an fsopen() and then pass
> the fd to a daemon in the init namespace via unix socket. The daemon
> vets the NFS mount parameters (ensuring that the mount options are
> sane, and that we trust the server), and then does the mount inside the
> container.

The mountfsd model - kinda.

> 
> We don't want to set FS_USERNS_MOUNT on NFS, because that would give
> the container carte blanche to mount anything it likes, even a
> malicious server. Do we need to split that flag into two? Maybe
> FS_USERNS_SAFE and FS_USERNS_MOUNT?

I think you can simply add FS_USERNS_DELEGATABLE and raise it for nfs.