linux-kernel - Re: [PATCH v2 28/33] nsfs: support file handles

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250915-laufpass-anraten-b250875c462a@brauner>
Date: Mon, 15 Sep 2025 15:55:05 +0200
From: Christian Brauner <brauner@...nel.org>
To: Jan Kara <jack@...e.cz>
Cc: Amir Goldstein <amir73il@...il.com>, linux-fsdevel@...r.kernel.org, 
	Josef Bacik <josef@...icpanda.com>, Jeff Layton <jlayton@...nel.org>, Mike Yuan <me@...dnzj.com>, 
	Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>, Lennart Poettering <mzxreary@...inter.de>, 
	Daan De Meyer <daan.j.demeyer@...il.com>, Aleksa Sarai <cyphar@...har.com>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>, 
	Johannes Weiner <hannes@...xchg.org>, Michal Koutný <mkoutny@...e.com>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Chuck Lever <chuck.lever@...cle.com>, linux-nfs@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH v2 28/33] nsfs: support file handles

On Mon, Sep 15, 2025 at 03:25:20PM +0200, Jan Kara wrote:
> On Fri 12-09-25 13:52:51, Christian Brauner wrote:
> > A while ago we added support for file handles to pidfs so pidfds can be
> > encoded and decoded as file handles. Userspace has adopted this quickly
> > and it's proven very useful. Implement file handles for namespaces as
> > well.
> > 
> > A process is not always able to open /proc/self/ns/. That requires
> > procfs to be mounted and for /proc/self/ or /proc/self/ns/ to not be
> > overmounted. However, userspace can always derive a namespace fd from
> > a pidfd. And that always works for a task's own namespace.
> > 
> > There's no need to introduce unnecessary behavioral differences between
> > /proc/self/ns/ fds, pidfd-derived namespace fds, and file-handle-derived
> > namespace fds. So namespace file handles are always decodable if the
> > caller is located in the namespace the file handle refers to.
> > 
> > This also allows a task to e.g., store a set of file handles to its
> > namespaces in a file on-disk so it can verify when it gets rexeced that
> > they're still valid and so on. This is akin to the pidfd use-case.
> > 
> > Or just plainly for namespace comparison reasons where a file handle to
> > the task's own namespace can be easily compared against others.
> > 
> > Reviewed-by: Amir Goldstein <amir73il@...il.com>
> > Signed-off-by: Christian Brauner <brauner@...nel.org>
> 
> ...
> 
> > +	switch (ns->ops->type) {
> > +#ifdef CONFIG_CGROUPS
> > +	case CLONE_NEWCGROUP:
> > +		if (!current_in_namespace(to_cg_ns(ns)))
> > +			owning_ns = to_cg_ns(ns)->user_ns;
> > +		break;
> > +#endif
> > +#ifdef CONFIG_IPC_NS
> > +	case CLONE_NEWIPC:
> > +		if (!current_in_namespace(to_ipc_ns(ns)))
> > +			owning_ns = to_ipc_ns(ns)->user_ns;
> > +		break;
> > +#endif
> > +	case CLONE_NEWNS:
> > +		if (!current_in_namespace(to_mnt_ns(ns)))
> > +			owning_ns = to_mnt_ns(ns)->user_ns;
> > +		break;
> > +#ifdef CONFIG_NET_NS
> > +	case CLONE_NEWNET:
> > +		if (!current_in_namespace(to_net_ns(ns)))
> > +			owning_ns = to_net_ns(ns)->user_ns;
> > +		break;
> > +#endif
> > +#ifdef CONFIG_PID_NS
> > +	case CLONE_NEWPID:
> > +		if (!current_in_namespace(to_pid_ns(ns))) {
> > +			owning_ns = to_pid_ns(ns)->user_ns;
> > +		} else if (!READ_ONCE(to_pid_ns(ns)->child_reaper)) {
> > +			ns->ops->put(ns);
> > +			return ERR_PTR(-EPERM);
> > +		}
> > +		break;
> > +#endif
> > +#ifdef CONFIG_TIME_NS
> > +	case CLONE_NEWTIME:
> > +		if (!current_in_namespace(to_time_ns(ns)))
> > +			owning_ns = to_time_ns(ns)->user_ns;
> > +		break;
> > +#endif
> > +#ifdef CONFIG_USER_NS
> > +	case CLONE_NEWUSER:
> > +		if (!current_in_namespace(to_user_ns(ns)))
> > +			owning_ns = to_user_ns(ns);
> > +		break;
> > +#endif
> > +#ifdef CONFIG_UTS_NS
> > +	case CLONE_NEWUTS:
> > +		if (!current_in_namespace(to_uts_ns(ns)))
> > +			owning_ns = to_uts_ns(ns)->user_ns;
> > +		break;
> > +#endif
> 
> Frankly, switches like these are asking for more Generic usage ;) But ok
> for now.
> 
> > +	default:
> > +		return ERR_PTR(-EOPNOTSUPP);
> > +	}
> > +
> > +	if (owning_ns && !ns_capable(owning_ns, CAP_SYS_ADMIN)) {
> > +		ns->ops->put(ns);
> > +		return ERR_PTR(-EPERM);
> > +	}
> > +
> > +	/* path_from_stashed() unconditionally consumes the reference. */
> > +	ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path);
> > +	if (ret)
> > +		return ERR_PTR(ret);
> > +
> > +	return no_free_ptr(path.dentry);
> 
> Ugh, so IMO this is very subtle because we declare
> 
> 	struct path path __free(path_put)
> 
> but then do no_free_ptr(path.dentry). I really had to lookup implementation
> of no_free_ptr() to check whether we are leaking mnt reference here or not
> (we are not). But that seems as an implementation detail we shouldn't
> better rely on? Wouldn't be:
> 
> 	return dget(path.dentry);
> 
> much clearer (and sligthly less efficient, I know, but who cares)?

Fine by me as well!