[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251128-kernel-namespaces-v619-28629f3fc911@brauner>
Date: Fri, 28 Nov 2025 17:48:16 +0100
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [GIT PULL 05/17 for v6.19] namespaces
Hey Linus,
/* Summary */
This contains substantial namespace infrastructure changes including a new
system call, active reference counting, and extensive header cleanups.
The branch depends on the shared kbuild branch for -fms-extensions support.
Features
- listns() System Call
Add a new listns() system call that allows userspace to iterate through
namespaces in the system. This provides a programmatic interface to
discover and inspect namespaces, addressing longstanding limitations:
Currently, there is no direct way for userspace to enumerate namespaces.
Applications must resort to scanning /proc//ns/ across all processes,
which is:
1. Inefficient - requires iterating over all processes
2. Incomplete - misses namespaces not attached to any running process but
kept alive by file descriptors, bind mounts, or parent references
3. Permission-heavy - requires access to /proc for many processes
4. No ordering or ownership information
5. No filtering per namespace type
The listns() system call solves these problems:
ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
size_t nr_ns_ids, unsigned int flags);
struct ns_id_req {
__u32 size;
__u32 spare;
__u64 ns_id;
struct /* listns */ {
__u32 ns_type;
__u32 spare2;
__u64 user_ns_id;
};
};
Features include:
- Pagination support for large namespace sets
- Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
- Filtering by owning user namespace
- Permission checks respecting namespace isolation
- Active Reference Counting
Introduce an active reference count that tracks namespace visibility to
userspace. A namespace is visible in the following cases:
1. The namespace is in use by a task
2. The namespace is persisted through a VFS object (namespace file
descriptor or bind-mount)
3. The namespace is a hierarchical type and is the parent of child
namespaces
The active reference count does not regulate lifetime (that's still done
by the normal reference count) - it only regulates visibility to namespace
file handles and listns().
This prevents resurrection of namespaces that are pinned only for internal
kernel reasons (e.g., user namespaces held by file->f_cred, lazy TLB
references on idle CPUs, etc.) which should not be accessible via (1)-(3).
- Unified Namespace Tree
Introduce a unified tree structure for all namespaces with:
- Fixed IDs assigned to initial namespaces
- Lookup based solely on inode number
- Maintained list of owned namespaces per user namespace
- Simplified rbtree comparison helpers
Cleanups
- Header Reorganization
- Move namespace types into separate header (ns_common_types.h)
- Decouple nstree from ns_common header
- Move nstree types into separate header
- Switch to new ns_tree_{node,root} structures with helper functions
- Use guards for ns_tree_lock
- Initial Namespace Reference Count Optimization
- Make all reference counts on initial namespaces a nop to avoid
pointless cacheline ping-pong for namespaces that can never go away
- Drop custom reference count initialization for initial namespaces
- Add NS_COMMON_INIT() macro and use it for all namespaces
- pid: rely on common reference count behavior
- Miscellaneous Cleanups
- Rename exit_task_namespaces() to exit_nsproxy_namespaces()
- Rename is_initial_namespace() and make argument const
- Use boolean to indicate anonymous mount namespace
- Simplify owner list iteration in nstree
- nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
- nsfs: use inode_just_drop()
- pidfs: raise DCACHE_DONTCACHE explicitly
- pidfs: simplify PIDFD_GET__NAMESPACE ioctls
- libfs: allow to specify s_d_flags
- cgroup: add cgroup namespace to tree after owner is set
- nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
Fixes
- setns(pidfd, ...) Race Condition
Fix a subtle race when using pidfds with setns(). When the target task
exits after prepare_nsset() but before commit_nsset(), the namespace's
active reference count might have been dropped. If setns() then installs
the namespaces, it would bump the active reference count from zero without
taking the required reference on the owner namespace, leading to underflow
when later decremented.
The fix resurrects the ownership chain if necessary - if the caller
succeeded in grabbing passive references, the setns() should succeed even
if the target task exits or gets reaped.
- Return EFAULT on put_user() error instead of success
- Make sure references are dropped outside of RCU lock (some namespaces
like mount namespace sleep when putting the last reference)
- Don't skip active reference count initialization for network namespace
- Add asserts for active refcount underflow
- Add asserts for initial namespace reference counts (both passive and
active)
- ipc: enable is_ns_init_id() assertions
- Fix kernel-doc comments for internal nstree functions
- Selftests
- 15 active reference count tests
- 9 listns() functionality tests
- 7 listns() permission tests
- 12 inactive namespace resurrection tests
- 3 threaded active reference count tests
- commit_creds() active reference tests
- Pagination and stress tests
- EFAULT handling test
- nsid tests fixes
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
diff --cc fs/namespace.c
index a7fd9682bcf9,25289b869be1..000000000000
--- a/fs/namespace.c
+++ b/fs/namespace.c
Merge conflicts with other trees
================================
[1] https://lore.kernel.org/linux-next/20251118110822.72e36c15@canb.auug.org.au
The following changes since commit dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa:
Linux 6.18-rc3 (2025-10-26 15:59:49 -0700)
are available in the Git repository at:
git@...olite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/namespace-6.19-rc1
for you to fetch changes up to a71e4f103aed69e7a11ea913312726bb194c76ee:
pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls (2025-11-17 16:23:13 +0100)
Please consider pulling these changes from the signed namespace-6.19-rc1 tag.
Thanks!
Christian
----------------------------------------------------------------
namespace-6.19-rc1
----------------------------------------------------------------
Christian Brauner (107):
libfs: allow to specify s_d_flags
nsfs: use inode_just_drop()
nsfs: raise DCACHE_DONTCACHE explicitly
pidfs: raise DCACHE_DONTCACHE explicitly
nsfs: raise SB_I_NODEV and SB_I_NOEXEC
cgroup: add cgroup namespace to tree after owner is set
nstree: simplify return
ns: add missing authorship
ns: add NS_COMMON_INIT()
ns: use NS_COMMON_INIT() for all namespaces
ns: initialize ns_list_node for initial namespaces
ns: add __ns_ref_read()
ns: rename to exit_nsproxy_namespaces()
ns: add active reference count
ns: use anonymous struct to group list member
nstree: introduce a unified tree
nstree: allow lookup solely based on inode
nstree: assign fixed ids to the initial namespaces
nstree: maintain list of owned namespaces
nstree: simplify rbtree comparison helpers
nstree: add unified namespace list
nstree: add listns()
arch: hookup listns() system call
nsfs: update tools header
selftests/filesystems: remove CLONE_NEWPIDNS from setup_userns() helper
selftests/namespaces: first active reference count tests
selftests/namespaces: second active reference count tests
selftests/namespaces: third active reference count tests
selftests/namespaces: fourth active reference count tests
selftests/namespaces: fifth active reference count tests
selftests/namespaces: sixth active reference count tests
selftests/namespaces: seventh active reference count tests
selftests/namespaces: eigth active reference count tests
selftests/namespaces: ninth active reference count tests
selftests/namespaces: tenth active reference count tests
selftests/namespaces: eleventh active reference count tests
selftests/namespaces: twelth active reference count tests
selftests/namespaces: thirteenth active reference count tests
selftests/namespaces: fourteenth active reference count tests
selftests/namespaces: fifteenth active reference count tests
selftests/namespaces: add listns() wrapper
selftests/namespaces: first listns() test
selftests/namespaces: second listns() test
selftests/namespaces: third listns() test
selftests/namespaces: fourth listns() test
selftests/namespaces: fifth listns() test
selftests/namespaces: sixth listns() test
selftests/namespaces: seventh listns() test
selftests/namespaces: eigth listns() test
selftests/namespaces: ninth listns() test
selftests/namespaces: first listns() permission test
selftests/namespaces: second listns() permission test
selftests/namespaces: third listns() permission test
selftests/namespaces: fourth listns() permission test
selftests/namespaces: fifth listns() permission test
selftests/namespaces: sixth listns() permission test
selftests/namespaces: seventh listns() permission test
selftests/namespaces: first inactive namespace resurrection test
selftests/namespaces: second inactive namespace resurrection test
selftests/namespaces: third inactive namespace resurrection test
selftests/namespaces: fourth inactive namespace resurrection test
selftests/namespaces: fifth inactive namespace resurrection test
selftests/namespaces: sixth inactive namespace resurrection test
selftests/namespaces: seventh inactive namespace resurrection test
selftests/namespaces: eigth inactive namespace resurrection test
selftests/namespaces: ninth inactive namespace resurrection test
selftests/namespaces: tenth inactive namespace resurrection test
selftests/namespaces: eleventh inactive namespace resurrection test
selftests/namespaces: twelth inactive namespace resurrection test
selftests/namespace: first threaded active reference count test
selftests/namespace: second threaded active reference count test
selftests/namespace: third threaded active reference count test
selftests/namespace: commit_creds() active reference tests
selftests/namespace: add stress test
selftests/namespace: test listns() pagination
Merge patch series "nstree: listns()"
ns: don't skip active reference count initialization
ns: don't increment or decrement initial namespaces
ns: make sure reference are dropped outside of rcu lock
ns: return EFAULT on put_user() error
ns: handle setns(pidfd, ...) cleanly
ns: add asserts for active refcount underflow
selftests/namespaces: add active reference count regression test
Merge patch "kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS"
selftests/namespaces: test for efault
Merge patch series "ns: fixes for namespace iteration and active reference counting"
Merge branch 'kbuild-6.19.fms.extension'
ns: move namespace types into separate header
nstree: decouple from ns_common header
nstree: move nstree types into separate header
nstree: add helper to operate on struct ns_tree_{node,root}
nstree: switch to new structures
nstree: simplify owner list iteration
nstree: use guards for ns_tree_lock
ns: make is_initial_namespace() argument const
ns: rename is_initial_namespace()
fs: use boolean to indicate anonymous mount namespace
ipc: enable is_ns_init_id() assertions
ns: make all reference counts on initial namespace a nop
ns: add asserts for initial namespace reference counts
ns: add asserts for initial namespace active reference counts
pid: rely on common reference count behavior
ns: drop custom reference count initialization for initial namespaces
selftests/namespaces: fix nsid tests
Merge patch series "ns: header cleanups and initial namespace reference count improvements"
nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
Kriish Sharma (1):
nstree: fix kernel-doc comments for internal functions
Nathan Chancellor (2):
jfs: Rename _inline to avoid conflict with clang's '-fms-extensions'
kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS
Rasmus Villemoes (1):
Kbuild: enable -fms-extensions
Makefile | 3 +
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/kernel/vdso32/Makefile | 3 +-
arch/arm64/tools/syscall_32.tbl | 1 +
arch/loongarch/vdso/Makefile | 2 +-
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/boot/compressed/Makefile | 2 +-
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/boot/Makefile | 3 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/Makefile | 3 +-
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/s390/purgatory/Makefile | 3 +-
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/Makefile | 4 +-
arch/x86/boot/compressed/Makefile | 7 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
drivers/firmware/efi/libstub/Makefile | 4 +-
fs/jfs/jfs_incore.h | 6 +-
fs/libfs.c | 1 +
fs/mount.h | 3 +-
fs/namespace.c | 12 +-
fs/nsfs.c | 101 +-
fs/pidfs.c | 76 +-
include/linux/ns/ns_common_types.h | 196 ++
include/linux/ns/nstree_types.h | 55 +
include/linux/ns_common.h | 233 +-
include/linux/nsfs.h | 3 +
include/linux/nsproxy.h | 9 +-
include/linux/nstree.h | 52 +-
include/linux/pid_namespace.h | 3 +-
include/linux/pseudo_fs.h | 1 +
include/linux/syscalls.h | 4 +
include/linux/user_namespace.h | 4 +-
include/uapi/asm-generic/unistd.h | 4 +-
include/uapi/linux/nsfs.h | 58 +
init/version-timestamp.c | 7 +-
ipc/msgutil.c | 7 +-
ipc/namespace.c | 3 +-
kernel/cgroup/cgroup.c | 11 +-
kernel/cgroup/namespace.c | 2 +-
kernel/cred.c | 6 +
kernel/exit.c | 3 +-
kernel/fork.c | 3 +-
kernel/nscommon.c | 246 +-
kernel/nsproxy.c | 57 +-
kernel/nstree.c | 782 +++++-
kernel/pid.c | 12 +-
kernel/pid_namespace.c | 2 +-
kernel/time/namespace.c | 5 +-
kernel/user.c | 7 +-
net/core/net_namespace.c | 2 +-
scripts/Makefile.extrawarn | 4 +-
scripts/syscall.tbl | 1 +
tools/include/uapi/linux/nsfs.h | 70 +
tools/testing/selftests/filesystems/utils.c | 2 +-
tools/testing/selftests/namespaces/.gitignore | 9 +
tools/testing/selftests/namespaces/Makefile | 24 +-
.../selftests/namespaces/cred_change_test.c | 814 ++++++
.../selftests/namespaces/listns_efault_test.c | 530 ++++
.../selftests/namespaces/listns_pagination_bug.c | 138 +
.../selftests/namespaces/listns_permissions_test.c | 759 ++++++
tools/testing/selftests/namespaces/listns_test.c | 679 +++++
.../selftests/namespaces/ns_active_ref_test.c | 2672 ++++++++++++++++++++
tools/testing/selftests/namespaces/nsid_test.c | 107 +-
.../namespaces/regression_pidfd_setns_test.c | 113 +
.../testing/selftests/namespaces/siocgskns_test.c | 1824 +++++++++++++
tools/testing/selftests/namespaces/stress_test.c | 626 +++++
tools/testing/selftests/namespaces/wrappers.h | 35 +
77 files changed, 9997 insertions(+), 436 deletions(-)
create mode 100644 include/linux/ns/ns_common_types.h
create mode 100644 include/linux/ns/nstree_types.h
create mode 100644 tools/testing/selftests/namespaces/cred_change_test.c
create mode 100644 tools/testing/selftests/namespaces/listns_efault_test.c
create mode 100644 tools/testing/selftests/namespaces/listns_pagination_bug.c
create mode 100644 tools/testing/selftests/namespaces/listns_permissions_test.c
create mode 100644 tools/testing/selftests/namespaces/listns_test.c
create mode 100644 tools/testing/selftests/namespaces/ns_active_ref_test.c
create mode 100644 tools/testing/selftests/namespaces/regression_pidfd_setns_test.c
create mode 100644 tools/testing/selftests/namespaces/siocgskns_test.c
create mode 100644 tools/testing/selftests/namespaces/stress_test.c
create mode 100644 tools/testing/selftests/namespaces/wrappers.h
Powered by blists - more mailing lists