lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251128-kernel-namespaces-v619-28629f3fc911@brauner>
Date: Fri, 28 Nov 2025 17:48:16 +0100
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>,
	linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [GIT PULL 05/17 for v6.19] namespaces

Hey Linus,

/* Summary */
This contains substantial namespace infrastructure changes including a new
system call, active reference counting, and extensive header cleanups.
The branch depends on the shared kbuild branch for -fms-extensions support.

Features

- listns() System Call

  Add a new listns() system call that allows userspace to iterate through
  namespaces in the system. This provides a programmatic interface to
  discover and inspect namespaces, addressing longstanding limitations:

  Currently, there is no direct way for userspace to enumerate namespaces.
  Applications must resort to scanning /proc//ns/ across all processes,
  which is:

  1. Inefficient - requires iterating over all processes
  2. Incomplete - misses namespaces not attached to any running process but
     kept alive by file descriptors, bind mounts, or parent references
  3. Permission-heavy - requires access to /proc for many processes
  4. No ordering or ownership information
  5. No filtering per namespace type

  The listns() system call solves these problems:

  ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
                 size_t nr_ns_ids, unsigned int flags);

  struct ns_id_req {
          __u32 size;
          __u32 spare;
          __u64 ns_id;
          struct /* listns */ {
                  __u32 ns_type;
                  __u32 spare2;
                  __u64 user_ns_id;
          };
  };

  Features include:

  - Pagination support for large namespace sets

  - Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)

  - Filtering by owning user namespace

  - Permission checks respecting namespace isolation

- Active Reference Counting

  Introduce an active reference count that tracks namespace visibility to
  userspace. A namespace is visible in the following cases:

  1. The namespace is in use by a task
  2. The namespace is persisted through a VFS object (namespace file
     descriptor or bind-mount)
  3. The namespace is a hierarchical type and is the parent of child
     namespaces

  The active reference count does not regulate lifetime (that's still done
  by the normal reference count) - it only regulates visibility to namespace
  file handles and listns().

  This prevents resurrection of namespaces that are pinned only for internal
  kernel reasons (e.g., user namespaces held by file->f_cred, lazy TLB
  references on idle CPUs, etc.) which should not be accessible via (1)-(3).

- Unified Namespace Tree

  Introduce a unified tree structure for all namespaces with:

  - Fixed IDs assigned to initial namespaces

  - Lookup based solely on inode number

  - Maintained list of owned namespaces per user namespace

  - Simplified rbtree comparison helpers

Cleanups

- Header Reorganization

  - Move namespace types into separate header (ns_common_types.h)

  - Decouple nstree from ns_common header

  - Move nstree types into separate header

  - Switch to new ns_tree_{node,root} structures with helper functions

  - Use guards for ns_tree_lock

- Initial Namespace Reference Count Optimization

  - Make all reference counts on initial namespaces a nop to avoid
    pointless cacheline ping-pong for namespaces that can never go away

  - Drop custom reference count initialization for initial namespaces

  - Add NS_COMMON_INIT() macro and use it for all namespaces

  - pid: rely on common reference count behavior

- Miscellaneous Cleanups

  - Rename exit_task_namespaces() to exit_nsproxy_namespaces()

  - Rename is_initial_namespace() and make argument const

  - Use boolean to indicate anonymous mount namespace

  - Simplify owner list iteration in nstree

  - nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly

  - nsfs: use inode_just_drop()

  - pidfs: raise DCACHE_DONTCACHE explicitly

  - pidfs: simplify PIDFD_GET__NAMESPACE ioctls

  - libfs: allow to specify s_d_flags

  - cgroup: add cgroup namespace to tree after owner is set

  - nsproxy: fix free_nsproxy() and simplify create_new_namespaces()

Fixes

- setns(pidfd, ...) Race Condition

  Fix a subtle race when using pidfds with setns(). When the target task
  exits after prepare_nsset() but before commit_nsset(), the namespace's
  active reference count might have been dropped. If setns() then installs
  the namespaces, it would bump the active reference count from zero without
  taking the required reference on the owner namespace, leading to underflow
  when later decremented.

  The fix resurrects the ownership chain if necessary - if the caller
  succeeded in grabbing passive references, the setns() should succeed even
  if the target task exits or gets reaped.

- Return EFAULT on put_user() error instead of success

- Make sure references are dropped outside of RCU lock (some namespaces
  like mount namespace sleep when putting the last reference)

- Don't skip active reference count initialization for network namespace

- Add asserts for active refcount underflow

- Add asserts for initial namespace reference counts (both passive and
  active)

- ipc: enable is_ns_init_id() assertions

- Fix kernel-doc comments for internal nstree functions

- Selftests

  - 15 active reference count tests

  - 9 listns() functionality tests

  - 7 listns() permission tests

  - 12 inactive namespace resurrection tests

  - 3 threaded active reference count tests

  - commit_creds() active reference tests

  - Pagination and stress tests

  - EFAULT handling test

  - nsid tests fixes

/* Testing */

gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)

No build failures or warnings were observed.

/* Conflicts */

Merge conflicts with mainline
=============================

diff --cc fs/namespace.c
index a7fd9682bcf9,25289b869be1..000000000000
--- a/fs/namespace.c
+++ b/fs/namespace.c

Merge conflicts with other trees
================================

[1] https://lore.kernel.org/linux-next/20251118110822.72e36c15@canb.auug.org.au

The following changes since commit dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa:

  Linux 6.18-rc3 (2025-10-26 15:59:49 -0700)

are available in the Git repository at:

  git@...olite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/namespace-6.19-rc1

for you to fetch changes up to a71e4f103aed69e7a11ea913312726bb194c76ee:

  pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls (2025-11-17 16:23:13 +0100)

Please consider pulling these changes from the signed namespace-6.19-rc1 tag.

Thanks!
Christian

----------------------------------------------------------------
namespace-6.19-rc1

----------------------------------------------------------------
Christian Brauner (107):
      libfs: allow to specify s_d_flags
      nsfs: use inode_just_drop()
      nsfs: raise DCACHE_DONTCACHE explicitly
      pidfs: raise DCACHE_DONTCACHE explicitly
      nsfs: raise SB_I_NODEV and SB_I_NOEXEC
      cgroup: add cgroup namespace to tree after owner is set
      nstree: simplify return
      ns: add missing authorship
      ns: add NS_COMMON_INIT()
      ns: use NS_COMMON_INIT() for all namespaces
      ns: initialize ns_list_node for initial namespaces
      ns: add __ns_ref_read()
      ns: rename to exit_nsproxy_namespaces()
      ns: add active reference count
      ns: use anonymous struct to group list member
      nstree: introduce a unified tree
      nstree: allow lookup solely based on inode
      nstree: assign fixed ids to the initial namespaces
      nstree: maintain list of owned namespaces
      nstree: simplify rbtree comparison helpers
      nstree: add unified namespace list
      nstree: add listns()
      arch: hookup listns() system call
      nsfs: update tools header
      selftests/filesystems: remove CLONE_NEWPIDNS from setup_userns() helper
      selftests/namespaces: first active reference count tests
      selftests/namespaces: second active reference count tests
      selftests/namespaces: third active reference count tests
      selftests/namespaces: fourth active reference count tests
      selftests/namespaces: fifth active reference count tests
      selftests/namespaces: sixth active reference count tests
      selftests/namespaces: seventh active reference count tests
      selftests/namespaces: eigth active reference count tests
      selftests/namespaces: ninth active reference count tests
      selftests/namespaces: tenth active reference count tests
      selftests/namespaces: eleventh active reference count tests
      selftests/namespaces: twelth active reference count tests
      selftests/namespaces: thirteenth active reference count tests
      selftests/namespaces: fourteenth active reference count tests
      selftests/namespaces: fifteenth active reference count tests
      selftests/namespaces: add listns() wrapper
      selftests/namespaces: first listns() test
      selftests/namespaces: second listns() test
      selftests/namespaces: third listns() test
      selftests/namespaces: fourth listns() test
      selftests/namespaces: fifth listns() test
      selftests/namespaces: sixth listns() test
      selftests/namespaces: seventh listns() test
      selftests/namespaces: eigth listns() test
      selftests/namespaces: ninth listns() test
      selftests/namespaces: first listns() permission test
      selftests/namespaces: second listns() permission test
      selftests/namespaces: third listns() permission test
      selftests/namespaces: fourth listns() permission test
      selftests/namespaces: fifth listns() permission test
      selftests/namespaces: sixth listns() permission test
      selftests/namespaces: seventh listns() permission test
      selftests/namespaces: first inactive namespace resurrection test
      selftests/namespaces: second inactive namespace resurrection test
      selftests/namespaces: third inactive namespace resurrection test
      selftests/namespaces: fourth inactive namespace resurrection test
      selftests/namespaces: fifth inactive namespace resurrection test
      selftests/namespaces: sixth inactive namespace resurrection test
      selftests/namespaces: seventh inactive namespace resurrection test
      selftests/namespaces: eigth inactive namespace resurrection test
      selftests/namespaces: ninth inactive namespace resurrection test
      selftests/namespaces: tenth inactive namespace resurrection test
      selftests/namespaces: eleventh inactive namespace resurrection test
      selftests/namespaces: twelth inactive namespace resurrection test
      selftests/namespace: first threaded active reference count test
      selftests/namespace: second threaded active reference count test
      selftests/namespace: third threaded active reference count test
      selftests/namespace: commit_creds() active reference tests
      selftests/namespace: add stress test
      selftests/namespace: test listns() pagination
      Merge patch series "nstree: listns()"
      ns: don't skip active reference count initialization
      ns: don't increment or decrement initial namespaces
      ns: make sure reference are dropped outside of rcu lock
      ns: return EFAULT on put_user() error
      ns: handle setns(pidfd, ...) cleanly
      ns: add asserts for active refcount underflow
      selftests/namespaces: add active reference count regression test
      Merge patch "kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS"
      selftests/namespaces: test for efault
      Merge patch series "ns: fixes for namespace iteration and active reference counting"
      Merge branch 'kbuild-6.19.fms.extension'
      ns: move namespace types into separate header
      nstree: decouple from ns_common header
      nstree: move nstree types into separate header
      nstree: add helper to operate on struct ns_tree_{node,root}
      nstree: switch to new structures
      nstree: simplify owner list iteration
      nstree: use guards for ns_tree_lock
      ns: make is_initial_namespace() argument const
      ns: rename is_initial_namespace()
      fs: use boolean to indicate anonymous mount namespace
      ipc: enable is_ns_init_id() assertions
      ns: make all reference counts on initial namespace a nop
      ns: add asserts for initial namespace reference counts
      ns: add asserts for initial namespace active reference counts
      pid: rely on common reference count behavior
      ns: drop custom reference count initialization for initial namespaces
      selftests/namespaces: fix nsid tests
      Merge patch series "ns: header cleanups and initial namespace reference count improvements"
      nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
      pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls

Kriish Sharma (1):
      nstree: fix kernel-doc comments for internal functions

Nathan Chancellor (2):
      jfs: Rename _inline to avoid conflict with clang's '-fms-extensions'
      kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS

Rasmus Villemoes (1):
      Kbuild: enable -fms-extensions

 Makefile                                           |    3 +
 arch/alpha/kernel/syscalls/syscall.tbl             |    1 +
 arch/arm/tools/syscall.tbl                         |    1 +
 arch/arm64/kernel/vdso32/Makefile                  |    3 +-
 arch/arm64/tools/syscall_32.tbl                    |    1 +
 arch/loongarch/vdso/Makefile                       |    2 +-
 arch/m68k/kernel/syscalls/syscall.tbl              |    1 +
 arch/microblaze/kernel/syscalls/syscall.tbl        |    1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl          |    1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl          |    1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl          |    1 +
 arch/parisc/boot/compressed/Makefile               |    2 +-
 arch/parisc/kernel/syscalls/syscall.tbl            |    1 +
 arch/powerpc/boot/Makefile                         |    3 +-
 arch/powerpc/kernel/syscalls/syscall.tbl           |    1 +
 arch/s390/Makefile                                 |    3 +-
 arch/s390/kernel/syscalls/syscall.tbl              |    1 +
 arch/s390/purgatory/Makefile                       |    3 +-
 arch/sh/kernel/syscalls/syscall.tbl                |    1 +
 arch/sparc/kernel/syscalls/syscall.tbl             |    1 +
 arch/x86/Makefile                                  |    4 +-
 arch/x86/boot/compressed/Makefile                  |    7 +-
 arch/x86/entry/syscalls/syscall_32.tbl             |    1 +
 arch/x86/entry/syscalls/syscall_64.tbl             |    1 +
 arch/xtensa/kernel/syscalls/syscall.tbl            |    1 +
 drivers/firmware/efi/libstub/Makefile              |    4 +-
 fs/jfs/jfs_incore.h                                |    6 +-
 fs/libfs.c                                         |    1 +
 fs/mount.h                                         |    3 +-
 fs/namespace.c                                     |   12 +-
 fs/nsfs.c                                          |  101 +-
 fs/pidfs.c                                         |   76 +-
 include/linux/ns/ns_common_types.h                 |  196 ++
 include/linux/ns/nstree_types.h                    |   55 +
 include/linux/ns_common.h                          |  233 +-
 include/linux/nsfs.h                               |    3 +
 include/linux/nsproxy.h                            |    9 +-
 include/linux/nstree.h                             |   52 +-
 include/linux/pid_namespace.h                      |    3 +-
 include/linux/pseudo_fs.h                          |    1 +
 include/linux/syscalls.h                           |    4 +
 include/linux/user_namespace.h                     |    4 +-
 include/uapi/asm-generic/unistd.h                  |    4 +-
 include/uapi/linux/nsfs.h                          |   58 +
 init/version-timestamp.c                           |    7 +-
 ipc/msgutil.c                                      |    7 +-
 ipc/namespace.c                                    |    3 +-
 kernel/cgroup/cgroup.c                             |   11 +-
 kernel/cgroup/namespace.c                          |    2 +-
 kernel/cred.c                                      |    6 +
 kernel/exit.c                                      |    3 +-
 kernel/fork.c                                      |    3 +-
 kernel/nscommon.c                                  |  246 +-
 kernel/nsproxy.c                                   |   57 +-
 kernel/nstree.c                                    |  782 +++++-
 kernel/pid.c                                       |   12 +-
 kernel/pid_namespace.c                             |    2 +-
 kernel/time/namespace.c                            |    5 +-
 kernel/user.c                                      |    7 +-
 net/core/net_namespace.c                           |    2 +-
 scripts/Makefile.extrawarn                         |    4 +-
 scripts/syscall.tbl                                |    1 +
 tools/include/uapi/linux/nsfs.h                    |   70 +
 tools/testing/selftests/filesystems/utils.c        |    2 +-
 tools/testing/selftests/namespaces/.gitignore      |    9 +
 tools/testing/selftests/namespaces/Makefile        |   24 +-
 .../selftests/namespaces/cred_change_test.c        |  814 ++++++
 .../selftests/namespaces/listns_efault_test.c      |  530 ++++
 .../selftests/namespaces/listns_pagination_bug.c   |  138 +
 .../selftests/namespaces/listns_permissions_test.c |  759 ++++++
 tools/testing/selftests/namespaces/listns_test.c   |  679 +++++
 .../selftests/namespaces/ns_active_ref_test.c      | 2672 ++++++++++++++++++++
 tools/testing/selftests/namespaces/nsid_test.c     |  107 +-
 .../namespaces/regression_pidfd_setns_test.c       |  113 +
 .../testing/selftests/namespaces/siocgskns_test.c  | 1824 +++++++++++++
 tools/testing/selftests/namespaces/stress_test.c   |  626 +++++
 tools/testing/selftests/namespaces/wrappers.h      |   35 +
 77 files changed, 9997 insertions(+), 436 deletions(-)
 create mode 100644 include/linux/ns/ns_common_types.h
 create mode 100644 include/linux/ns/nstree_types.h
 create mode 100644 tools/testing/selftests/namespaces/cred_change_test.c
 create mode 100644 tools/testing/selftests/namespaces/listns_efault_test.c
 create mode 100644 tools/testing/selftests/namespaces/listns_pagination_bug.c
 create mode 100644 tools/testing/selftests/namespaces/listns_permissions_test.c
 create mode 100644 tools/testing/selftests/namespaces/listns_test.c
 create mode 100644 tools/testing/selftests/namespaces/ns_active_ref_test.c
 create mode 100644 tools/testing/selftests/namespaces/regression_pidfd_setns_test.c
 create mode 100644 tools/testing/selftests/namespaces/siocgskns_test.c
 create mode 100644 tools/testing/selftests/namespaces/stress_test.c
 create mode 100644 tools/testing/selftests/namespaces/wrappers.h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ