[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251109225528.9063-1-hdanton@sina.com>
Date: Mon, 10 Nov 2025 06:55:26 +0800
From: Hillf Danton <hdanton@...a.com>
To: Christian Brauner <brauner@...nel.org>
Cc: linux-fsdevel@...r.kernel.org,
Jann Horn <jannh@...gle.com>,
Jan Kara <jack@...e.cz>,
linux-kernel@...r.kernel.org,
syzbot+1957b26299cf3ff7890c@...kaller.appspotmail.com
Subject: Re: [PATCH 0/8] ns: fixes for namespace iteration and active reference counting
On Sun, 09 Nov 2025 22:11:21 +0100 Christian Brauner wrote:
> * Make sure to initialize the active reference count for the initial
> network namespace and prevent __ns_common_init() from returning too
> early.
>
> * Make sure that passive reference counts are dropped outside of rcu
> read locks as some namespaces such as the mount namespace do in fact
> sleep when putting the last reference.
>
> * The setns() system call supports:
>
> (1) namespace file descriptors (nsfd)
> (2) process file descriptors (pidfd)
>
> When using nsfds the namespaces will remain active because they are
> pinned by the vfs. However, when pidfds are used things are more
> complicated.
>
> When the target task exits and passes through exit_nsproxy_namespaces()
> or is reaped and thus also passes through exit_cred_namespaces() after
> the setns()'ing task has called prepare_nsset() but before the active
> reference count of the set of namespaces it wants to setns() to might
> have been dropped already:
>
> P1 P2
>
> pid_p1 = clone(CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWNS)
> pidfd = pidfd_open(pid_p1)
> setns(pidfd, CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWNS)
> prepare_nsset()
>
> exit(0)
> // ns->__ns_active_ref == 1
> // parent_ns->__ns_active_ref == 1
> -> exit_nsproxy_namespaces()
> -> exit_cred_namespaces()
>
> // ns_active_ref_put() will also put
> // the reference on the owner of the
> // namespace. If the only reason the
> // owning namespace was alive was
> // because it was a parent of @ns
> // it's active reference count now goes
> // to zero... --------------------------------
> // |
> // ns->__ns_active_ref == 0 |
> // parent_ns->__ns_active_ref == 0 |
> | commit_nsset()
> -----------------> // If setns()
> // now manages to install the namespaces
> // it will call ns_active_ref_get()
> // on them thus bumping the active reference
> // count from zero again but without also
> // taking the required reference on the owner.
> // Thus we get:
> //
> // ns->__ns_active_ref == 1
> // parent_ns->__ns_active_ref == 0
>
> When later someone does ns_active_ref_put() on @ns it will underflow
> parent_ns->__ns_active_ref leading to a splat from our asserts
> thinking there are still active references when in fact the counter
> just underflowed.
>
> So resurrect the ownership chain if necessary as well. If the caller
> succeeded to grab passive references to the set of namespaces the
> setns() should simply succeed even if the target task exists or gets
> reaped in the meantime.
>
> The race is rare and can only be triggered when using pidfs to setns()
> to namespaces. Also note that active reference on initial namespaces are
> nops.
>
> Since we now always handle parent references directly we can drop
> ns_ref_active_get_owner() when adding a namespace to a namespace tree.
> This is now all handled uniformly in the places where the new namespaces
> actually become active.
>
> Signed-off-by: Christian Brauner <brauner@...nel.org>
> ---
>
FYI namespace-6.19.fixes failed to survive the syzbot test [1].
[1] Subject: Re: [syzbot] [lsm?] WARNING in put_cred_rcu
https://lore.kernel.org/lkml/690eedba.a70a0220.22f260.0075.GAE@google.com/
Powered by blists - more mailing lists