[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251028-fernweh-prasseln-b066fb441ee6@brauner>
Date: Tue, 28 Oct 2025 16:32:39 +0100
From: Christian Brauner <brauner@...nel.org>
To: Simon Horman <horms@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, Josef Bacik <josef@...icpanda.com>, 
	Jeff Layton <jlayton@...nel.org>, Jann Horn <jannh@...gle.com>, Mike Yuan <me@...dnzj.com>, 
	Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>, Lennart Poettering <mzxreary@...inter.de>, 
	Daan De Meyer <daan.j.demeyer@...il.com>, Aleksa Sarai <cyphar@...har.com>, 
	Amir Goldstein <amir73il@...il.com>, Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Alexander Viro <viro@...iv.linux.org.uk>, 
	Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, 
	bpf@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org, Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH v3 11/70] ns: add active reference count
On Tue, Oct 28, 2025 at 10:30:06AM +0000, Simon Horman wrote:
> On Fri, Oct 24, 2025 at 12:52:40PM +0200, Christian Brauner wrote:
> 
> ...
> 
> > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> 
> ...
> 
> > +void get_cred_namespaces(struct task_struct *tsk)
> > +{
> > +	ns_ref_active_get(tsk->real_cred->user_ns);
> 
> Hi Christian,
> 
> real_cred is protected by RCU, but this code doesn't seem to take
> that into account. Or, at least Sparse doesn't think so:
> 
> .../nsproxy.c:264:9: error: no generic selection for 'struct user_namespace *const [noderef] __rcu user_ns'
> .../nsproxy.c:264:9: warning: dereference of noderef expression
> 
> > +}
> > +
> > +void exit_cred_namespaces(struct task_struct *tsk)
> > +{
> > +	ns_ref_active_put(tsk->real_cred->user_ns);
> 
> Likewise here.
get_cred_namespaces() is called during copy_creds() which is called
during process creation aka from copy_process(). So copy_creds() always
takes the creds of current (the parent process in this case) which can't
change in any way.
Simplifying a bit:
Either we created a thread via CLONE_THREAD in which case we can't
specify CLONE_NEWUSER (little know fact, I guess) and so we just bump
the reference count on the existing user namespace from the parent's
creds, or we're creating a new set of credentials that no one has ever
seen before possibly even a new user namespace if CLONE_NEWUSER has been
specified.
In both case the credentials are completely stable. The call to
exit_cred_namespaces() has similar reasoning when called from the
cleanup/failure path of copy_process().
The other callsite is release_task() which is called - simplifying -
after the task has been reaped. That thing is deader than dead and
nothing can mess with its creds anymore.
In other words, the get/put patterns for namespace management generally
happens at edges where the relevant structures are stable and can't be
changed by anyone other than the calling thread. And at no point are we
putting references on creds themselves.
Let me know if I missed something obvious.
Powered by blists - more mailing lists
 
