linux-kernel - Re: [PATCH] pid: allow pidfds for reaped tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230807-porzellan-rehkitz-9fde1b94dd6b@brauner>
Date:   Mon, 7 Aug 2023 12:07:17 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     David Rheinsberg <david@...dahead.eu>,
        linux-kernel@...r.kernel.org, Jan Kara <jack@...e.cz>,
        Kees Cook <keescook@...omium.org>,
        Alexander Mikhalitsyn <alexander@...alicyn.com>,
        Luca Boccassi <bluca@...ian.org>
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

Hey Oleg,

A question for you below.

On Mon, Aug 07, 2023 at 10:52:03AM +0200, David Rheinsberg wrote:
> A pidfd can currently only be created for tasks that are thread-group
> leaders and not reaped. This patch changes the pidfd-core to allow for
> pidfds on reapead thread-group leaders as well.
> 
> A pidfd can outlive the task it refers to, and thus user-space must
> already be prepared that the task underlying a pidfd is gone at the time
> they get their hands on the pidfd. For instance, resolving the pidfd to
> a PID via the fdinfo must be prepared to read `-1`.
> 
> Despite user-space knowing that a pidfd might be stale, several kernel
> APIs currently add another layer that checks for this. In particular,
> SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped,
> but returns a stale pidfd if the task is reaped immediately after the
> respective alive-check.
> 
> This has the unfortunate effect that user-space now has two ways to
> check for the exact same scenario: A syscall might return
> EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no
> particular reason to distinguish both cases. This also propagates
> through user-space APIs, which pass on pidfds. They must be prepared to
> pass on `-1` *or* the pidfd, because there is no guaranteed way to get a
> stale pidfd from the kernel.
> 
> This patch changes the core pidfd helpers to allow creation of pidfds
> even if the PID is no longer linked to any task. This only affects one
> of the three pidfd users that currently exist:
> 
>  1) fanotify already tests for a linked TGID-task manually before
>     creating the PIDFD, thus it is not directly affected by this change.
>     However, note that the current fanotify code fails with an error if
>     the target process is reaped exactly between the TGID-check in
>     fanotify and the test in pidfd_prepare(). With this patch, this
>     will no longer be the case.
> 
>  2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus
>     it is also not directly affected by this change.
>     Again, similar to fanotify, there is a race between the
>     find_get_pid() call and pidfd_prepare(), which currently causes
>     pidfd_open(2) to return EINVAL rather than ESRCH if the process is
>     reaped just between those two checks. With this patch, this will no
>     longer be the case.
> 
>  3) SO_PEERPIDFD will be affected by this change and from now on return
>     stale pidfds rather than EINVAL if the respective peer task is
>     reaped already.
> 
> Given that users of SO_PEERPIDFD must already deal with stale pidfds,
> this change hopefully simplifies the API of SO_PEERPIDFD, and all
> dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus
> driver APIs). Also note that SO_PEERPIDFD is still pending to be
> released with linux-6.5.
> 
> Signed-off-by: David Rheinsberg <david@...dahead.eu>
> ---
>  kernel/fork.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d2e12b6d2b18..4dde19a8c264 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>   * Allocate a new file that stashes @pid and reserve a new pidfd number in the
>   * caller's file descriptor table. The pidfd is reserved but not installed yet.
>   *
> - * The helper verifies that @pid is used as a thread group leader.
> + * The helper verifies that @pid is/was used as a thread group leader.
>   *
>   * If this function returns successfully the caller is responsible to either
>   * call fd_install() passing the returned pidfd and pidfd file as arguments in
> @@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>   */
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
>  {
> -	if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> +	if (!pid)
> +		return -EINVAL;
> +
> +	/*
> +	 * Non thread-group leaders cannot have pidfds, but we allow them for
> +	 * reaped thread-group leaders.
> +	 */
> +	if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID))
>  		return -EINVAL;

TL;DR userspace wants to be able to get a pidfd to an already reaped
thread-group leader. I don't see any issues with this.

But I'm not entirely clear how to make it safe so that we can
distinguish between @pid not being used as a thread-group leader and
PIDTYPE_TGID having already been detached from @pid. IOW, we need a
snapshot of PIDTYPE_PID and PIDTYPE_TGID so that we can compare the
returned tasks (Or another way to achieve a similar result.).

Any thoughts?