linux-kernel - Re: [PATCH] pid: allow pidfds for reaped tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJqdLroQE1sU2uOG018Ezyome1sQb89c_mdSdvOCpM7=XNt+xg@mail.gmail.com>
Date:   Mon, 7 Aug 2023 11:01:20 +0200
From:   Alexander Mikhalitsyn <alexander@...alicyn.com>
To:     David Rheinsberg <david@...dahead.eu>
Cc:     linux-kernel@...r.kernel.org,
        Christian Brauner <brauner@...nel.org>,
        Jan Kara <jack@...e.cz>, Kees Cook <keescook@...omium.org>,
        Luca Boccassi <bluca@...ian.org>
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

On Mon, Aug 7, 2023 at 10:52 AM David Rheinsberg <david@...dahead.eu> wrote:
>
> A pidfd can currently only be created for tasks that are thread-group
> leaders and not reaped. This patch changes the pidfd-core to allow for
> pidfds on reapead thread-group leaders as well.
>
> A pidfd can outlive the task it refers to, and thus user-space must
> already be prepared that the task underlying a pidfd is gone at the time
> they get their hands on the pidfd. For instance, resolving the pidfd to
> a PID via the fdinfo must be prepared to read `-1`.
>
> Despite user-space knowing that a pidfd might be stale, several kernel
> APIs currently add another layer that checks for this. In particular,
> SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped,
> but returns a stale pidfd if the task is reaped immediately after the
> respective alive-check.
>
> This has the unfortunate effect that user-space now has two ways to
> check for the exact same scenario: A syscall might return
> EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no
> particular reason to distinguish both cases. This also propagates
> through user-space APIs, which pass on pidfds. They must be prepared to
> pass on `-1` *or* the pidfd, because there is no guaranteed way to get a
> stale pidfd from the kernel.
>
> This patch changes the core pidfd helpers to allow creation of pidfds
> even if the PID is no longer linked to any task. This only affects one
> of the three pidfd users that currently exist:
>
>  1) fanotify already tests for a linked TGID-task manually before
>     creating the PIDFD, thus it is not directly affected by this change.
>     However, note that the current fanotify code fails with an error if
>     the target process is reaped exactly between the TGID-check in
>     fanotify and the test in pidfd_prepare(). With this patch, this
>     will no longer be the case.
>
>  2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus
>     it is also not directly affected by this change.
>     Again, similar to fanotify, there is a race between the
>     find_get_pid() call and pidfd_prepare(), which currently causes
>     pidfd_open(2) to return EINVAL rather than ESRCH if the process is
>     reaped just between those two checks. With this patch, this will no
>     longer be the case.
>
>  3) SO_PEERPIDFD will be affected by this change and from now on return
>     stale pidfds rather than EINVAL if the respective peer task is
>     reaped already.
>
> Given that users of SO_PEERPIDFD must already deal with stale pidfds,
> this change hopefully simplifies the API of SO_PEERPIDFD, and all
> dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus
> driver APIs). Also note that SO_PEERPIDFD is still pending to be
> released with linux-6.5.
>
> Signed-off-by: David Rheinsberg <david@...dahead.eu>
> ---
>  kernel/fork.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d2e12b6d2b18..4dde19a8c264 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>   * Allocate a new file that stashes @pid and reserve a new pidfd number in the
>   * caller's file descriptor table. The pidfd is reserved but not installed yet.
>   *
> - * The helper verifies that @pid is used as a thread group leader.
> + * The helper verifies that @pid is/was used as a thread group leader.
>   *
>   * If this function returns successfully the caller is responsible to either
>   * call fd_install() passing the returned pidfd and pidfd file as arguments in
> @@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>   */
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
>  {
> -       if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> +       if (!pid)
> +               return -EINVAL;
> +
> +       /*
> +        * Non thread-group leaders cannot have pidfds, but we allow them for
> +        * reaped thread-group leaders.
> +        */
> +       if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID))
>                 return -EINVAL;

Hi David!

As far as I understand, __unhash_process is always called with a
tasklist_lock held for writing.
Don't we need to take tasklist_lock for reading here to guarantee
consistency between
pid_has_task(pid, PIDTYPE_PID) and pid_has_task(pid, PIDTYPE_TGID)
return values?

Kind regards,
Alex

>
>         return __pidfd_prepare(pid, flags, ret);
> --
> 2.41.0
>