[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250721-procfs-pidns-api-v1-3-5cd9007e512d@cyphar.com>
Date: Mon, 21 Jul 2025 18:44:13 +1000
From: Aleksa Sarai <cyphar@...har.com>
To: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-api@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kselftest@...r.kernel.org, Aleksa Sarai <cyphar@...har.com>
Subject: [PATCH RFC 3/4] procfs: add PROCFS_GET_PID_NAMESPACE ioctl
/proc has historically had very opaque semantics about PID namespaces,
which is a little unfortunate for container runtimes and other programs
that deal with switching namespaces very often. One common issue is that
of converting between PIDs in the process's namespace and PIDs in the
namespace of /proc.
In principle, it is possible to do this today by opening a pidfd with
pidfd_open(2) and then looking at /proc/self/fdinfo/$n (which will
contain a PID value translated to the pid namespace associated with that
procfs superblock).
However, allocating a new file for each PID to be converted is less than
ideal for programs that may need to scan procfs, and it is generally
useful for userspace to be able to finally get this information from
procfs. This also acts as a sister feature to the pidns= mount option,
finally allowing userspace full control of the pid namespaces associated
with /proc instances.
Signed-off-by: Aleksa Sarai <cyphar@...har.com>
---
Documentation/filesystems/proc.rst | 4 +++
fs/proc/root.c | 52 ++++++++++++++++++++++++++++++++++++--
include/uapi/linux/fs.h | 3 +++
3 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index c520b9f8a3fd..506383273c9d 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -2398,6 +2398,10 @@ pidns= specifies a pid namespace (either as a string path to something like
will be used by the procfs instance when translating pids. By default, procfs
will use the calling process's active pid namespace.
+Processes can check which pid namespace is used by a procfs instance by using
+the `PROCFS_GET_PID_NAMESPACE` ioctl() on the root directory of the procfs
+instance.
+
Chapter 5: Filesystem behavior
==============================
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 10ca94be0eef..ee90749ccd8e 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -23,8 +23,10 @@
#include <linux/cred.h>
#include <linux/magic.h>
#include <linux/slab.h>
+#include <linux/ptrace.h>
#include "internal.h"
+#include "../internal.h"
struct proc_fs_context {
struct pid_namespace *pid_ns;
@@ -408,15 +410,61 @@ static int proc_root_readdir(struct file *file, struct dir_context *ctx)
return proc_pid_readdir(file, ctx);
}
+static long int proc_root_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ switch (cmd) {
+ case PROCFS_GET_PID_NAMESPACE: {
+ struct pid_namespace *active = task_active_pid_ns(current);
+ struct pid_namespace *ns = proc_pid_ns(file_inode(filp)->i_sb);
+ bool can_access_pidns = false;
+
+ /*
+ * If we are in an ancestors of the pidns, or have join
+ * privileges (CAP_SYS_ADMIN), then it makes sense that we
+ * would be able to grab a handle to the pidns.
+ *
+ * Otherwise, if there is a root process, then being able to
+ * access /proc/$pid/ns/pid is equivalent to this ioctl and so
+ * we should probably match the permission model. For empty
+ * namespaces it seems unlikely for there to be a downside to
+ * allowing unprivileged users to open a handle to it (setns
+ * will fail for unprivileged users anyway).
+ */
+ can_access_pidns = pidns_is_ancestor(ns, active) ||
+ ns_capable(ns->user_ns, CAP_SYS_ADMIN);
+ if (!can_access_pidns) {
+ bool cannot_ptrace_pid1 = false;
+
+ read_lock(&tasklist_lock);
+ if (ns->child_reaper)
+ cannot_ptrace_pid1 = ptrace_may_access(ns->child_reaper,
+ PTRACE_MODE_READ_FSCREDS);
+ read_unlock(&tasklist_lock);
+ can_access_pidns = !cannot_ptrace_pid1;
+ }
+ if (!can_access_pidns)
+ return -EPERM;
+
+ /* open_namespace() unconditionally consumes the reference. */
+ get_pid_ns(ns);
+ return open_namespace(to_ns_common(ns));
+ }
+ default:
+ return -ENOIOCTLCMD;
+ }
+}
+
/*
* The root /proc directory is special, as it has the
* <pid> directories. Thus we don't use the generic
* directory handling functions for that..
*/
static const struct file_operations proc_root_operations = {
- .read = generic_read_dir,
- .iterate_shared = proc_root_readdir,
+ .read = generic_read_dir,
+ .iterate_shared = proc_root_readdir,
.llseek = generic_file_llseek,
+ .unlocked_ioctl = proc_root_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
};
/*
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 0bd678a4a10e..aa642cb48feb 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -437,6 +437,9 @@ typedef int __bitwise __kernel_rwf_t;
#define PROCFS_IOCTL_MAGIC 'f'
+/* procfs root ioctls */
+#define PROCFS_GET_PID_NAMESPACE _IO(PROCFS_IOCTL_MAGIC, 1)
+
/* Pagemap ioctl */
#define PAGEMAP_SCAN _IOWR(PROCFS_IOCTL_MAGIC, 16, struct pm_scan_arg)
--
2.50.0
Powered by blists - more mailing lists