[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKbZUD2ZB+U3GKJftfRH_2ejNja26v38OLVE2Lbfn_1KSOKhNQ@mail.gmail.com>
Date: Fri, 25 Oct 2024 13:50:12 +0100
From: Pedro Falcato <pedro.falcato@...il.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Christian Brauner <christian@...uner.io>, Shuah Khan <shuah@...nel.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>, Suren Baghdasaryan <surenb@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>, linux-kselftest@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
linux-kernel@...r.kernel.org, Oliver Sang <oliver.sang@...el.com>,
John Hubbard <jhubbard@...dia.com>
Subject: Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
>
> It is useful to be able to utilise the pidfd mechanism to reference the
> current thread or process (from a userland point of view - thread group
> leader from the kernel's point of view).
>
> Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
>
> For convenience and to avoid confusion from userland's perspective we alias
> these:
>
> * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
> the user will want to use, as they would find it surprising if for
> instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
> and that failed.
>
> * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
> have no concept of thread groups or what a thread group leader is, and
> from userland's perspective and nomenclature this is what userland
> considers to be a process.
>
> Due to the refactoring of the central __pidfd_get_pid() function we can
> implement this functionality centrally, providing the use of this sentinel
> in most functionality which utilises pidfd's.
>
> We need to explicitly adjust kernel_waitid_prepare() to permit this (though
> it wouldn't really make sense to use this there, we provide the ability for
> consistency).
>
> We explicitly disallow use of this in setns(), which would otherwise have
> required explicit custom handling, as it doesn't make sense to set the
> current calling thread to join the namespace of itself.
>
> As the callers of pidfd_get_pid() expect an increased reference count on
> the pid we do so in the self case, reducing churn and avoiding any breakage
> from existing logic which decrements this reference count.
>
> This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
> ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
> pidfd_getfd() system calls.
>
> Things such as polling a pidfs and general fd operations are not supported,
> this strictly provides the sentinel for APIs which explicitly accept a
> pidfd.
>
> Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> ---
> include/linux/pid.h | 8 ++++--
> include/uapi/linux/pidfd.h | 15 +++++++++++
> kernel/exit.c | 3 ++-
> kernel/nsproxy.c | 1 +
> kernel/pid.c | 51 ++++++++++++++++++++++++--------------
> 5 files changed, 57 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index d466890e1b35..3b2ac7567a88 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -78,11 +78,15 @@ struct file;
> * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> *
> * @pidfd: The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> - * @alloc_proc is also set.
> + * @alloc_proc is also set, or PIDFD_SELF_* to refer to the current
> + * thread or thread group leader.
> * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
> * of a pidfd, and this will be used to determine the pid.
> +
> * @flags: Output variable, if non-NULL, then the file->f_flags of the
> - * pidfd will be set here.
> + * pidfd will be set here or If PIDFD_SELF_THREAD is set, this is
> + * set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP then
> + * this is set to zero.
> *
> * Returns: If successful, the pid associated with the pidfd, otherwise an
> * error.
> diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> index 565fc0629fff..0ca2ebf906fd 100644
> --- a/include/uapi/linux/pidfd.h
> +++ b/include/uapi/linux/pidfd.h
> @@ -29,4 +29,19 @@
> #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9)
> #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10)
>
> +/*
> + * Special sentinel values which can be used to refer to the current thread or
> + * thread group leader (which from a userland perspective is the process).
> + */
> +#define PIDFD_SELF PIDFD_SELF_THREAD
> +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP
> +
> +#define PIDFD_SELF_THREAD -100 /* Current thread. */
This conflicts with AT_FDCWD, might be worth changing?
> +#define PIDFD_SELF_THREAD_GROUP -200 /* Current thread group leader. */
We might want to pick some range outside of the negative errno space
(-4096 IIRC), since we have plenty of values to pick from (2^31 at
least).
> +static inline int pidfd_is_self_sentinel(pid_t pid)
> +{
> + return pid == PIDFD_SELF_THREAD || pid == PIDFD_SELF_THREAD_GROUP;
> +}
Do we want this in the uapi header? Even if this is useful, it might
come with several drawbacks such as breaking scripts that parse kernel
headers (and a quick git grep suggests we do have static inlines in
headers, but in rather obscure ones) and breaking C89:
<source>:8:8: error: unknown type name 'inline'
8 | static inline int pidfd_is_self_sentinel(pid_t pid)
:)
> +
> #endif /* _UAPI_LINUX_PIDFD_H */
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 619f0014c33b..3eb20f8252ee 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -71,6 +71,7 @@
> #include <linux/user_events.h>
> #include <linux/uaccess.h>
>
> +#include <uapi/linux/pidfd.h>
> #include <uapi/linux/wait.h>
>
> #include <asm/unistd.h>
> @@ -1739,7 +1740,7 @@ int kernel_waitid_prepare(struct wait_opts *wo, int which, pid_t upid,
> break;
> case P_PIDFD:
> type = PIDTYPE_PID;
> - if (upid < 0)
> + if (upid < 0 && !pidfd_is_self_sentinel(upid))
> return -EINVAL;
>
> pid = pidfd_get_pid(upid, &f_flags);
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index dc952c3b05af..d239f7eeaa1f 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -550,6 +550,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
> struct nsset nsset = {};
> int err = 0;
>
> + /* If fd is PIDFD_SELF_*, implicitly fail here, as invalid. */
> if (!fd_file(f))
> return -EBADF;
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 94c97559e5c5..8742157b36f8 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -535,33 +535,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> }
> EXPORT_SYMBOL_GPL(find_ge_pid);
>
> +static struct pid *pidfd_get_pid_self(unsigned int pidfd, unsigned int *flags)
> +{
> + bool is_thread = pidfd == PIDFD_SELF_THREAD;
> + enum pid_type type = is_thread ? PIDTYPE_PID : PIDTYPE_TGID;
> + struct pid *pid = *task_pid_ptr(current, type);
> +
> + /* The caller expects an elevated reference count. */
> + get_pid(pid);
It would be really really nice to avoid the get here, but I imagine
it'll take some refactoring around put_pid's?
> + return pid;
> +}
> +
> struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc,
> unsigned int *flags)
> {
> - struct pid *pid;
> - struct fd f = fdget(pidfd);
> - struct file *file = fd_file(f);
> + if (pidfd_is_self_sentinel(pidfd)) {
> + return pidfd_get_pid_self(pidfd, flags);
> + } else {
Skipping the else here might make the rest of the code more legible
(since the sentinel branch returns anyway...).
--
Pedro
Powered by blists - more mailing lists