linux-kernel - Re: [GIT PULL 01/12 for v6.18] misc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250929-trivial-zoodirektor-9e2bc1148d03@brauner>
Date: Mon, 29 Sep 2025 11:47:23 +0200
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL 01/12 for v6.18] misc

On Fri, Sep 26, 2025 at 04:18:55PM +0200, Christian Brauner wrote:
> Hey Linus,
> 
> /* Summary */
> This contains the usual selections of misc updates for this cycle.
> 
> Features:
> 
> - Add "initramfs_options" parameter to set initramfs mount options. This
>   allows to add specific mount options to the rootfs to e.g., limit the
>   memory size.
> 
> - Add RWF_NOSIGNAL flag for pwritev2()
> 
>   Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
>   signal from being raised when writing on disconnected pipes or
>   sockets. The flag is handled directly by the pipe filesystem and
>   converted to the existing MSG_NOSIGNAL flag for sockets.
> 
> - Allow to pass pid namespace as procfs mount option
> 
>   Ever since the introduction of pid namespaces, procfs has had very
>   implicit behaviour surrounding them (the pidns used by a procfs mount
>   is auto-selected based on the mounting process's active pidns, and the
>   pidns itself is basically hidden once the mount has been constructed).
> 
>   This implicit behaviour has historically meant that userspace was
>   required to do some special dances in order to configure the pidns of
>   a procfs mount as desired. Examples include:
> 
>   * In order to bypass the mnt_too_revealing() check, Kubernetes creates
>     a procfs mount from an empty pidns so that user namespaced
>     containers can be nested (without this, the nested containers would
>     fail to mount procfs). But this requires forking off a helper
>     process because you cannot just one-shot this using mount(2).
> 
>   * Container runtimes in general need to fork into a container before
>     configuring its mounts, which can lead to security issues in the
>     case of shared-pidns containers (a privileged process in the pidns
>     can interact with your container runtime process).
>     While SUID_DUMP_DISABLE and user namespaces make this less of an
>     issue, the strict need for this due to a minor uAPI wart is kind of
>     unfortunate.
> 
>     Things would be much easier if there was a way for userspace to just
>     specify the pidns they want. So this pull request contains changes
>     to implement a new "pidns" argument which can be set using
>     fsconfig(2):
> 
>         fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
>         fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
> 
>     or classic mount(2) / mount(8):
> 
>         // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
>         mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
> 
> Cleanups:
> 
> - Remove the last references to EXPORT_OP_ASYNC_LOCK.
> 
> - Make file_remove_privs_flags() static.
> 
> - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used.
> 
> - Use try_cmpxchg() in start_dir_add().
> 
> - Use try_cmpxchg() in sb_init_done_wq().
> 
> - Replace offsetof() with struct_size() in ioctl_file_dedupe_range().
> 
> - Remove vfs_ioctl() export.
> 
> - Replace rwlock() with spinlock in epoll code as rwlock causes priority
>   inversion on preempt rt kernels.
> 
> - Make ns_entries in fs/proc/namespaces const.
> 
> - Use a switch() statement() in init_special_inode() just like we do in
>   may_open().
> 
> - Use struct_size() in dir_add() in the initramfs code.
> 
> - Use str_plural() in rd_load_image().
> 
> - Replace strcpy() with strscpy() in find_link().
> 
> - Rename generic_delete_inode() to inode_just_drop() and
>   generic_drop_inode() to inode_generic_drop().
> 
> - Remove unused arguments from fcntl_{g,s}et_rw_hint().
> 
> Fixes:
> 
> - Document @name parameter for name_contains_dotdot() helper.
> 
> - Fix spelling mistake.
> 
> - Always return zero from replace_fd() instead of the file descriptor number.
> 
> - Limit the size for copy_file_range() in compat mode to prevent a signed
>   overflow.
> 
> - Fix debugfs mount options not being applied.
> 
> - Verify the inode mode when loading it from disk in minixfs.
> 
> - Verify the inode mode when loading it from disk in cramfs.
> 
> - Don't trigger automounts with RESOLVE_NO_XDEV
> 
>   If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
>   through automounts, but could still trigger them.
> 
> - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints.
> 
> - Fix unused variable warning in rd_load_image() on s390.
> 
> - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD.
> 
> - Use ns_capable_noaudit() when determining net sysctl permissions.
> 
> - Don't call path_put() under namespace semaphore in listmount() and statmount().
> 
> /* Testing */
> 
> gcc (Debian 14.2.0-19) 14.2.0
> Debian clang version 19.1.7 (3+b1)
> 
> No build failures or warnings were observed.
> 
> /* Conflicts */

There is one issue that was reported after I had generated the pull
request. The mnt_ns_release() function can be passed a NULL pointer and
that case needs to be handled.

I'm appending a patch that I would ask you to please just apply on top
of it. If you rather want me resend the pull request please just tell
me!

View attachment "0001-mount-handle-NULL-values-in-mnt_ns_release.patch" of type "text/x-diff" (935 bytes)