[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c96df57a-fa1b-4301-9556-94a6b8c93a31@lucifer.local>
Date: Mon, 28 Oct 2024 16:06:07 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Christian Brauner <brauner@...nel.org>
Cc: Oleg Nesterov <oleg@...hat.com>, Christian Brauner <christian@...uner.io>,
Shuah Khan <shuah@...nel.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>, pedro.falcato@...il.com,
linux-kselftest@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
linux-kernel@...r.kernel.org, Oliver Sang <oliver.sang@...el.com>,
John Hubbard <jhubbard@...dia.com>
Subject: Re: [PATCH v6 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own
thread/process
On Mon, Oct 28, 2024 at 04:34:33PM +0100, Christian Brauner wrote:
> On Sat, Oct 26, 2024 at 08:24:58AM +0100, Lorenzo Stoakes wrote:
> > It is useful to be able to utilise the pidfd mechanism to reference the
> > current thread or process (from a userland point of view - thread group
> > leader from the kernel's point of view).
> >
> > Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> > PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
> >
> > For convenience and to avoid confusion from userland's perspective we alias
> > these:
> >
> > * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
> > the user will want to use, as they would find it surprising if for
> > instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
> > and that failed.
> >
> > * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
> > have no concept of thread groups or what a thread group leader is, and
> > from userland's perspective and nomenclature this is what userland
> > considers to be a process.
> >
> > Due to the refactoring of the central __pidfd_get_pid() function we can
> > implement this functionality centrally, providing the use of this sentinel
> > in most functionality which utilises pidfd's.
> >
> > We need to explicitly adjust kernel_waitid_prepare() to permit this (though
> > it wouldn't really make sense to use this there, we provide the ability for
> > consistency).
> >
> > We explicitly disallow use of this in setns(), which would otherwise have
> > required explicit custom handling, as it doesn't make sense to set the
> > current calling thread to join the namespace of itself.
> >
> > As the callers of pidfd_get_pid() expect an increased reference count on
> > the pid we do so in the self case, reducing churn and avoiding any breakage
> > from existing logic which decrements this reference count.
> >
> > This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
> > ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
> > pidfd_getfd() system calls.
> >
> > Things such as polling a pidfs and general fd operations are not supported,
> > this strictly provides the sentinel for APIs which explicitly accept a
> > pidfd.
> >
> > Suggested-by: Pedro Falcato <pedro.falcato@...il.com>
> > Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > ---
>
> Currently, a pidfd based system call like pidfd_send_signal() would
> simply do:
>
> fdget(pidfd);
> // use struct pid
> fdput(pidfd);
>
> Where the lifetime of @pid is guaranteed by @file. And in the regular
> case where there's only a single thread the file code will avoid taking
> a reference. Thus, there's no reference count bump on fdget(), nor a
> drop on fdput(), nor a get_pid() or put_pid().
Right I missed that fdget() wouldn't take a reference count I assumed it
would be equivalent, my mistake.
>
> With your patch series you will always cause reference counts on @pid to
> be taken for everyone. And I wouldn't be surprised if we get performance
> regressions for this.
This was in response to you review saying I can't pass around a pointer to
the fd, originally I didn't do this.
This was the only way I could find to de-jank and make my shared function
not end up problematic in the light of wanting to keep the fd within a
single scope, I didn't realise that passing that by value would be ok.
But obviously hadn't realised that fdget()/fdput() sometimes doesn't change
a reference count, mea culpa on that not an fs person...
>
> In one of my earlier mails I had mused about a fdput() like primitive.
> What I roughly, hastily, and under the influence of the flu, sketched in
> the _completey untested_ patch I appended illustrates roughly what I had
> been thinking about.
OK, I was really uncertain as to what you meant regarding the scope of this
value so had assumed we couldn't do something like assigning the value like
that.
I guess I'll try to adapt that and respin a v7 when I get a chance.
>
> > include/linux/pid.h | 8 ++++--
> > include/uapi/linux/pidfd.h | 10 ++++++++
> > kernel/exit.c | 4 ++-
> > kernel/nsproxy.c | 1 +
> > kernel/pid.c | 51 ++++++++++++++++++++++++--------------
> > 5 files changed, 53 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/pid.h b/include/linux/pid.h
> > index d466890e1b35..3b2ac7567a88 100644
> > --- a/include/linux/pid.h
> > +++ b/include/linux/pid.h
> > @@ -78,11 +78,15 @@ struct file;
> > * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> > *
> > * @pidfd: The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> > - * @alloc_proc is also set.
> > + * @alloc_proc is also set, or PIDFD_SELF_* to refer to the current
> > + * thread or thread group leader.
> > * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
> > * of a pidfd, and this will be used to determine the pid.
> > +
> > * @flags: Output variable, if non-NULL, then the file->f_flags of the
> > - * pidfd will be set here.
> > + * pidfd will be set here or If PIDFD_SELF_THREAD is set, this is
> > + * set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP then
> > + * this is set to zero.
> > *
> > * Returns: If successful, the pid associated with the pidfd, otherwise an
> > * error.
> > diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> > index 565fc0629fff..6fe1d63b2086 100644
> > --- a/include/uapi/linux/pidfd.h
> > +++ b/include/uapi/linux/pidfd.h
> > @@ -29,4 +29,14 @@
> > #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9)
> > #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10)
> >
> > +/*
> > + * Special sentinel values which can be used to refer to the current thread or
> > + * thread group leader (which from a userland perspective is the process).
> > + */
> > +#define PIDFD_SELF PIDFD_SELF_THREAD
> > +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP
> > +
> > +#define PIDFD_SELF_THREAD -10000 /* Current thread. */
> > +#define PIDFD_SELF_THREAD_GROUP -20000 /* Current thread group leader. */
> > +
> > #endif /* _UAPI_LINUX_PIDFD_H */
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 619f0014c33b..e4f85ec4ba78 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -71,6 +71,7 @@
> > #include <linux/user_events.h>
> > #include <linux/uaccess.h>
> >
> > +#include <uapi/linux/pidfd.h>
> > #include <uapi/linux/wait.h>
> >
> > #include <asm/unistd.h>
> > @@ -1739,7 +1740,8 @@ int kernel_waitid_prepare(struct wait_opts *wo, int which, pid_t upid,
> > break;
> > case P_PIDFD:
> > type = PIDTYPE_PID;
> > - if (upid < 0)
> > + if (upid < 0 && upid != PIDFD_SELF_THREAD &&
> > + upid != PIDFD_SELF_THREAD_GROUP)
> > return -EINVAL;
> >
> > pid = pidfd_get_pid(upid, &f_flags);
> > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> > index dc952c3b05af..d239f7eeaa1f 100644
> > --- a/kernel/nsproxy.c
> > +++ b/kernel/nsproxy.c
> > @@ -550,6 +550,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
> > struct nsset nsset = {};
> > int err = 0;
> >
> > + /* If fd is PIDFD_SELF_*, implicitly fail here, as invalid. */
> > if (!fd_file(f))
> > return -EBADF;
> >
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index 94c97559e5c5..0a1861b4422c 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -535,33 +535,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> > }
> > EXPORT_SYMBOL_GPL(find_ge_pid);
> >
> > +static struct pid *pidfd_get_pid_self(unsigned int pidfd, unsigned int *flags)
>
> The @flags argument is unused afaict.
Oops will rework on v7.
>
> > +{
> > + bool is_thread = pidfd == PIDFD_SELF_THREAD;
> > + enum pid_type type = is_thread ? PIDTYPE_PID : PIDTYPE_TGID;
> > + struct pid *pid = *task_pid_ptr(current, type);
> > +
> > + /* The caller expects an elevated reference count. */
> > + get_pid(pid);
> > + return pid;
> > +}
>
> Fwiw, what you've done here is essentially reimplement the already
> existing get_task_pid() helper that you could simply use.
We're looking up PIDFD_SELF_* values here. So presumably you mean the:
struct pid *pid = *task_pid_ptr(current, type);
/* The caller expects an elevated reference count. */
get_pid(pid);
Bit is duplicated vs. get_task_pid()?
I did that because it wasn't clear doing that under the RCU lock was
necessary or useful?
It seems useful still to have the PIDFD_SELF stuff qseparate, I can replace
those two lines with a call to get_task_pid() if you prefer? Unless you
meant something else?
>
> > +
> > struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc,
> > unsigned int *flags)
> > {
> > - struct pid *pid;
> > - struct fd f = fdget(pidfd);
> > - struct file *file = fd_file(f);
> > + if (pidfd == PIDFD_SELF_THREAD || pidfd == PIDFD_SELF_THREAD_GROUP) {
> > + return pidfd_get_pid_self(pidfd, flags);
> > + } else {
>
> I think the else can just go and we can save an indentation level.
This has been raised a couple times before by other reviewers, this is just
so we can declare variables, especially the fd variable, which you were
very clear _must_ retain scope only where it used.
Otherwise I have to do something like;
struct fd f = {};
if (...) { return ...; }
f = fdget(...);
This way we don't need to do that.
I mean probably the compiler would do the right thing but it just seems
ugly to assign/reassign a stack value like that.
Ah, I see struct fd is just a wrapper around an unsigned long, so probably
not a big deal to just leave it unassigned then.
This was the only reason I did this, I usually much prefer the guard
pattern.
OK if you're fine with this value being assigned like that then no problem
will change!
>
> > + struct pid *pid;
> > + struct fd f = fdget(pidfd);
> > + struct file *file = fd_file(f);
> >
> > - if (!file)
> > - return ERR_PTR(-EBADF);
> > + if (!file)
> > + return ERR_PTR(-EBADF);
> >
> > - pid = pidfd_pid(file);
> > - /* If we allow opening a pidfd via /proc/<pid>, do so. */
> > - if (IS_ERR(pid) && allow_proc)
> > - pid = tgid_pidfd_to_pid(file);
> > + pid = pidfd_pid(file);
> > + /* If we allow opening a pidfd via /proc/<pid>, do so. */
> > + if (IS_ERR(pid) && allow_proc)
> > + pid = tgid_pidfd_to_pid(file);
> >
> > - if (IS_ERR(pid)) {
> > + if (IS_ERR(pid)) {
> > + fdput(f);
> > + return pid;
> > + }
> > +
> > + /* Pin pid before we release fd. */
> > + get_pid(pid);
> > + if (flags)
> > + *flags = file->f_flags;
> > fdput(f);
> > +
> > return pid;
> > }
> > -
> > - /* Pin pid before we release fd. */
> > - get_pid(pid);
> > - if (flags)
> > - *flags = file->f_flags;
> > - fdput(f);
> > -
> > - return pid;
> > }
> >
> > /**
> > --
> > 2.47.0
Powered by blists - more mailing lists