[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250929-trivial-zoodirektor-9e2bc1148d03@brauner>
Date: Mon, 29 Sep 2025 11:47:23 +0200
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL 01/12 for v6.18] misc
On Fri, Sep 26, 2025 at 04:18:55PM +0200, Christian Brauner wrote:
> Hey Linus,
>
> /* Summary */
> This contains the usual selections of misc updates for this cycle.
>
> Features:
>
> - Add "initramfs_options" parameter to set initramfs mount options. This
> allows to add specific mount options to the rootfs to e.g., limit the
> memory size.
>
> - Add RWF_NOSIGNAL flag for pwritev2()
>
> Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
> signal from being raised when writing on disconnected pipes or
> sockets. The flag is handled directly by the pipe filesystem and
> converted to the existing MSG_NOSIGNAL flag for sockets.
>
> - Allow to pass pid namespace as procfs mount option
>
> Ever since the introduction of pid namespaces, procfs has had very
> implicit behaviour surrounding them (the pidns used by a procfs mount
> is auto-selected based on the mounting process's active pidns, and the
> pidns itself is basically hidden once the mount has been constructed).
>
> This implicit behaviour has historically meant that userspace was
> required to do some special dances in order to configure the pidns of
> a procfs mount as desired. Examples include:
>
> * In order to bypass the mnt_too_revealing() check, Kubernetes creates
> a procfs mount from an empty pidns so that user namespaced
> containers can be nested (without this, the nested containers would
> fail to mount procfs). But this requires forking off a helper
> process because you cannot just one-shot this using mount(2).
>
> * Container runtimes in general need to fork into a container before
> configuring its mounts, which can lead to security issues in the
> case of shared-pidns containers (a privileged process in the pidns
> can interact with your container runtime process).
> While SUID_DUMP_DISABLE and user namespaces make this less of an
> issue, the strict need for this due to a minor uAPI wart is kind of
> unfortunate.
>
> Things would be much easier if there was a way for userspace to just
> specify the pidns they want. So this pull request contains changes
> to implement a new "pidns" argument which can be set using
> fsconfig(2):
>
> fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
> fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
>
> or classic mount(2) / mount(8):
>
> // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
> mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
>
> Cleanups:
>
> - Remove the last references to EXPORT_OP_ASYNC_LOCK.
>
> - Make file_remove_privs_flags() static.
>
> - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used.
>
> - Use try_cmpxchg() in start_dir_add().
>
> - Use try_cmpxchg() in sb_init_done_wq().
>
> - Replace offsetof() with struct_size() in ioctl_file_dedupe_range().
>
> - Remove vfs_ioctl() export.
>
> - Replace rwlock() with spinlock in epoll code as rwlock causes priority
> inversion on preempt rt kernels.
>
> - Make ns_entries in fs/proc/namespaces const.
>
> - Use a switch() statement() in init_special_inode() just like we do in
> may_open().
>
> - Use struct_size() in dir_add() in the initramfs code.
>
> - Use str_plural() in rd_load_image().
>
> - Replace strcpy() with strscpy() in find_link().
>
> - Rename generic_delete_inode() to inode_just_drop() and
> generic_drop_inode() to inode_generic_drop().
>
> - Remove unused arguments from fcntl_{g,s}et_rw_hint().
>
> Fixes:
>
> - Document @name parameter for name_contains_dotdot() helper.
>
> - Fix spelling mistake.
>
> - Always return zero from replace_fd() instead of the file descriptor number.
>
> - Limit the size for copy_file_range() in compat mode to prevent a signed
> overflow.
>
> - Fix debugfs mount options not being applied.
>
> - Verify the inode mode when loading it from disk in minixfs.
>
> - Verify the inode mode when loading it from disk in cramfs.
>
> - Don't trigger automounts with RESOLVE_NO_XDEV
>
> If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
> through automounts, but could still trigger them.
>
> - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints.
>
> - Fix unused variable warning in rd_load_image() on s390.
>
> - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD.
>
> - Use ns_capable_noaudit() when determining net sysctl permissions.
>
> - Don't call path_put() under namespace semaphore in listmount() and statmount().
>
> /* Testing */
>
> gcc (Debian 14.2.0-19) 14.2.0
> Debian clang version 19.1.7 (3+b1)
>
> No build failures or warnings were observed.
>
> /* Conflicts */
There is one issue that was reported after I had generated the pull
request. The mnt_ns_release() function can be passed a NULL pointer and
that case needs to be handled.
I'm appending a patch that I would ask you to please just apply on top
of it. If you rather want me resend the pull request please just tell
me!
View attachment "0001-mount-handle-NULL-values-in-mnt_ns_release.patch" of type "text/x-diff" (935 bytes)
Powered by blists - more mailing lists