[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241028-gesoffen-drehmoment-5314faba9731@brauner>
Date: Mon, 28 Oct 2024 16:34:33 +0100
From: Christian Brauner <brauner@...nel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Oleg Nesterov <oleg@...hat.com>
Cc: Christian Brauner <christian@...uner.io>,
Shuah Khan <shuah@...nel.org>, "Liam R . Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>, pedro.falcato@...il.com,
linux-kselftest@...r.kernel.org, linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
linux-api@...r.kernel.org, linux-kernel@...r.kernel.org,
Oliver Sang <oliver.sang@...el.com>, John Hubbard <jhubbard@...dia.com>
Subject: Re: [PATCH v6 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own
thread/process
On Sat, Oct 26, 2024 at 08:24:58AM +0100, Lorenzo Stoakes wrote:
> It is useful to be able to utilise the pidfd mechanism to reference the
> current thread or process (from a userland point of view - thread group
> leader from the kernel's point of view).
>
> Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
>
> For convenience and to avoid confusion from userland's perspective we alias
> these:
>
> * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
> the user will want to use, as they would find it surprising if for
> instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
> and that failed.
>
> * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
> have no concept of thread groups or what a thread group leader is, and
> from userland's perspective and nomenclature this is what userland
> considers to be a process.
>
> Due to the refactoring of the central __pidfd_get_pid() function we can
> implement this functionality centrally, providing the use of this sentinel
> in most functionality which utilises pidfd's.
>
> We need to explicitly adjust kernel_waitid_prepare() to permit this (though
> it wouldn't really make sense to use this there, we provide the ability for
> consistency).
>
> We explicitly disallow use of this in setns(), which would otherwise have
> required explicit custom handling, as it doesn't make sense to set the
> current calling thread to join the namespace of itself.
>
> As the callers of pidfd_get_pid() expect an increased reference count on
> the pid we do so in the self case, reducing churn and avoiding any breakage
> from existing logic which decrements this reference count.
>
> This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
> ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
> pidfd_getfd() system calls.
>
> Things such as polling a pidfs and general fd operations are not supported,
> this strictly provides the sentinel for APIs which explicitly accept a
> pidfd.
>
> Suggested-by: Pedro Falcato <pedro.falcato@...il.com>
> Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> ---
Currently, a pidfd based system call like pidfd_send_signal() would
simply do:
fdget(pidfd);
// use struct pid
fdput(pidfd);
Where the lifetime of @pid is guaranteed by @file. And in the regular
case where there's only a single thread the file code will avoid taking
a reference. Thus, there's no reference count bump on fdget(), nor a
drop on fdput(), nor a get_pid() or put_pid().
With your patch series you will always cause reference counts on @pid to
be taken for everyone. And I wouldn't be surprised if we get performance
regressions for this.
In one of my earlier mails I had mused about a fdput() like primitive.
What I roughly, hastily, and under the influence of the flu, sketched in
the _completey untested_ patch I appended illustrates roughly what I had
been thinking about.
> include/linux/pid.h | 8 ++++--
> include/uapi/linux/pidfd.h | 10 ++++++++
> kernel/exit.c | 4 ++-
> kernel/nsproxy.c | 1 +
> kernel/pid.c | 51 ++++++++++++++++++++++++--------------
> 5 files changed, 53 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index d466890e1b35..3b2ac7567a88 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -78,11 +78,15 @@ struct file;
> * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> *
> * @pidfd: The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> - * @alloc_proc is also set.
> + * @alloc_proc is also set, or PIDFD_SELF_* to refer to the current
> + * thread or thread group leader.
> * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
> * of a pidfd, and this will be used to determine the pid.
> +
> * @flags: Output variable, if non-NULL, then the file->f_flags of the
> - * pidfd will be set here.
> + * pidfd will be set here or If PIDFD_SELF_THREAD is set, this is
> + * set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP then
> + * this is set to zero.
> *
> * Returns: If successful, the pid associated with the pidfd, otherwise an
> * error.
> diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> index 565fc0629fff..6fe1d63b2086 100644
> --- a/include/uapi/linux/pidfd.h
> +++ b/include/uapi/linux/pidfd.h
> @@ -29,4 +29,14 @@
> #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9)
> #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10)
>
> +/*
> + * Special sentinel values which can be used to refer to the current thread or
> + * thread group leader (which from a userland perspective is the process).
> + */
> +#define PIDFD_SELF PIDFD_SELF_THREAD
> +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP
> +
> +#define PIDFD_SELF_THREAD -10000 /* Current thread. */
> +#define PIDFD_SELF_THREAD_GROUP -20000 /* Current thread group leader. */
> +
> #endif /* _UAPI_LINUX_PIDFD_H */
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 619f0014c33b..e4f85ec4ba78 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -71,6 +71,7 @@
> #include <linux/user_events.h>
> #include <linux/uaccess.h>
>
> +#include <uapi/linux/pidfd.h>
> #include <uapi/linux/wait.h>
>
> #include <asm/unistd.h>
> @@ -1739,7 +1740,8 @@ int kernel_waitid_prepare(struct wait_opts *wo, int which, pid_t upid,
> break;
> case P_PIDFD:
> type = PIDTYPE_PID;
> - if (upid < 0)
> + if (upid < 0 && upid != PIDFD_SELF_THREAD &&
> + upid != PIDFD_SELF_THREAD_GROUP)
> return -EINVAL;
>
> pid = pidfd_get_pid(upid, &f_flags);
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index dc952c3b05af..d239f7eeaa1f 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -550,6 +550,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
> struct nsset nsset = {};
> int err = 0;
>
> + /* If fd is PIDFD_SELF_*, implicitly fail here, as invalid. */
> if (!fd_file(f))
> return -EBADF;
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 94c97559e5c5..0a1861b4422c 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -535,33 +535,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> }
> EXPORT_SYMBOL_GPL(find_ge_pid);
>
> +static struct pid *pidfd_get_pid_self(unsigned int pidfd, unsigned int *flags)
The @flags argument is unused afaict.
> +{
> + bool is_thread = pidfd == PIDFD_SELF_THREAD;
> + enum pid_type type = is_thread ? PIDTYPE_PID : PIDTYPE_TGID;
> + struct pid *pid = *task_pid_ptr(current, type);
> +
> + /* The caller expects an elevated reference count. */
> + get_pid(pid);
> + return pid;
> +}
Fwiw, what you've done here is essentially reimplement the already
existing get_task_pid() helper that you could simply use.
> +
> struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc,
> unsigned int *flags)
> {
> - struct pid *pid;
> - struct fd f = fdget(pidfd);
> - struct file *file = fd_file(f);
> + if (pidfd == PIDFD_SELF_THREAD || pidfd == PIDFD_SELF_THREAD_GROUP) {
> + return pidfd_get_pid_self(pidfd, flags);
> + } else {
I think the else can just go and we can save an indentation level.
> + struct pid *pid;
> + struct fd f = fdget(pidfd);
> + struct file *file = fd_file(f);
>
> - if (!file)
> - return ERR_PTR(-EBADF);
> + if (!file)
> + return ERR_PTR(-EBADF);
>
> - pid = pidfd_pid(file);
> - /* If we allow opening a pidfd via /proc/<pid>, do so. */
> - if (IS_ERR(pid) && allow_proc)
> - pid = tgid_pidfd_to_pid(file);
> + pid = pidfd_pid(file);
> + /* If we allow opening a pidfd via /proc/<pid>, do so. */
> + if (IS_ERR(pid) && allow_proc)
> + pid = tgid_pidfd_to_pid(file);
>
> - if (IS_ERR(pid)) {
> + if (IS_ERR(pid)) {
> + fdput(f);
> + return pid;
> + }
> +
> + /* Pin pid before we release fd. */
> + get_pid(pid);
> + if (flags)
> + *flags = file->f_flags;
> fdput(f);
> +
> return pid;
> }
> -
> - /* Pin pid before we release fd. */
> - get_pid(pid);
> - if (flags)
> - *flags = file->f_flags;
> - fdput(f);
> -
> - return pid;
> }
>
> /**
> --
> 2.47.0
View attachment "0001-UNTESTED-SKETCH-DRAFT.patch" of type "text/x-diff" (5459 bytes)
Powered by blists - more mailing lists