[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251110-elastisch-endeffekt-747abc5a614a@brauner>
Date: Mon, 10 Nov 2025 09:41:56 +0100
From: Christian Brauner <brauner@...nel.org>
To: Hillf Danton <hdanton@...a.com>
Cc: linux-fsdevel@...r.kernel.org, Jann Horn <jannh@...gle.com>,
Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
syzbot+1957b26299cf3ff7890c@...kaller.appspotmail.com
Subject: Re: [PATCH 0/8] ns: fixes for namespace iteration and active
reference counting
On Mon, Nov 10, 2025 at 06:55:26AM +0800, Hillf Danton wrote:
> On Sun, 09 Nov 2025 22:11:21 +0100 Christian Brauner wrote:
> > * Make sure to initialize the active reference count for the initial
> > network namespace and prevent __ns_common_init() from returning too
> > early.
> >
> > * Make sure that passive reference counts are dropped outside of rcu
> > read locks as some namespaces such as the mount namespace do in fact
> > sleep when putting the last reference.
> >
> > * The setns() system call supports:
> >
> > (1) namespace file descriptors (nsfd)
> > (2) process file descriptors (pidfd)
> >
> > When using nsfds the namespaces will remain active because they are
> > pinned by the vfs. However, when pidfds are used things are more
> > complicated.
> >
> > When the target task exits and passes through exit_nsproxy_namespaces()
> > or is reaped and thus also passes through exit_cred_namespaces() after
> > the setns()'ing task has called prepare_nsset() but before the active
> > reference count of the set of namespaces it wants to setns() to might
> > have been dropped already:
> >
> > P1 P2
> >
> > pid_p1 = clone(CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWNS)
> > pidfd = pidfd_open(pid_p1)
> > setns(pidfd, CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWNS)
> > prepare_nsset()
> >
> > exit(0)
> > // ns->__ns_active_ref == 1
> > // parent_ns->__ns_active_ref == 1
> > -> exit_nsproxy_namespaces()
> > -> exit_cred_namespaces()
> >
> > // ns_active_ref_put() will also put
> > // the reference on the owner of the
> > // namespace. If the only reason the
> > // owning namespace was alive was
> > // because it was a parent of @ns
> > // it's active reference count now goes
> > // to zero... --------------------------------
> > // |
> > // ns->__ns_active_ref == 0 |
> > // parent_ns->__ns_active_ref == 0 |
> > | commit_nsset()
> > -----------------> // If setns()
> > // now manages to install the namespaces
> > // it will call ns_active_ref_get()
> > // on them thus bumping the active reference
> > // count from zero again but without also
> > // taking the required reference on the owner.
> > // Thus we get:
> > //
> > // ns->__ns_active_ref == 1
> > // parent_ns->__ns_active_ref == 0
> >
> > When later someone does ns_active_ref_put() on @ns it will underflow
> > parent_ns->__ns_active_ref leading to a splat from our asserts
> > thinking there are still active references when in fact the counter
> > just underflowed.
> >
> > So resurrect the ownership chain if necessary as well. If the caller
> > succeeded to grab passive references to the set of namespaces the
> > setns() should simply succeed even if the target task exists or gets
> > reaped in the meantime.
> >
> > The race is rare and can only be triggered when using pidfs to setns()
> > to namespaces. Also note that active reference on initial namespaces are
> > nops.
> >
> > Since we now always handle parent references directly we can drop
> > ns_ref_active_get_owner() when adding a namespace to a namespace tree.
> > This is now all handled uniformly in the places where the new namespaces
> > actually become active.
> >
> > Signed-off-by: Christian Brauner <brauner@...nel.org>
> > ---
> >
> FYI namespace-6.19.fixes failed to survive the syzbot test [1].
>
> [1] Subject: Re: [syzbot] [lsm?] WARNING in put_cred_rcu
> https://lore.kernel.org/lkml/690eedba.a70a0220.22f260.0075.GAE@google.com/
This used a stale branch that existed for testing:
Tested on:
commit: 00f5a3b5 DO NOT MERGE - This is purely for testing a b..
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
git tree: https://github.com/brauner/linux.git namespace-6.19.fixes
console output: https://syzkaller.appspot.com/x/log.txt?x=17a46a58580000
kernel config: https://syzkaller.appspot.com/x/.config?x=e31f5f45f87b6763
dashboard link: https://syzkaller.appspot.com/bug?extid=553c4078ab14e3cf3358
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
Note: no patches were applied.
Powered by blists - more mailing lists