[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <06e7e269-6c87-f5f2-9a15-435b0a376105@linux.ibm.com>
Date: Tue, 11 Dec 2018 09:04:50 +0100
From: Stefan Liebler <stli@...ux.ibm.com>
To: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>
Cc: Heiko Carstens <heiko.carstens@...ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Darren Hart <dvhart@...radead.org>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [patch] futex: Cure exit race
Hi Thomas,
does this also handle the ESRCH returned by
attach_to_pi_owner(...)
{...
if (!pid)
return -ESRCH;
p = find_get_task_by_vpid(pid);
if (!p)
return -ESRCH;
...
I think pid should never be zero when attach_to_pi_owner is called.
But it can happen that p is null? At least I traced the "return -ESRCH"
with the 4.17 kernel. Unfortunately both returns were done by the same
instruction address.
Bye
Stefan
On 12/10/2018 04:23 PM, Thomas Gleixner wrote:
> Stefan reported, that the glibc tst-robustpi4 test case fails
> occasionally. That case creates the following race between
> sys_exit() and sys_futex(LOCK_PI):
>
> CPU0 CPU1
>
> sys_exit() sys_futex()
> do_exit() futex_lock_pi()
> exit_signals(tsk) No waiters:
> tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
> mm_release(tsk) Set waiter bit
> exit_robust_list(tsk) { *uaddr = 0x80000PID;
> Set owner died attach_to_pi_owner() {
> *uaddr = 0xC0000000; tsk = get_task(PID);
> } if (!tsk->flags & PF_EXITING) {
> ... attach();
> tsk->flags |= PF_EXITPIDONE; } else {
> if (!(tsk->flags & PF_EXITPIDONE))
> return -EAGAIN;
> return -ESRCH; <--- FAIL
> }
>
> ESRCH is returned all the way to user space, which triggers the glibc test
> case assert. Returning ESRCH unconditionally is wrong here because the user
> space value has been changed by the exiting task to 0xC0000000, i.e. the
> FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
> is a valid state and the kernel has to handle it, i.e. taking the futex.
>
> Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
> is set in the task which owns the futex. If the value has changed, let
> the kernel retry the operation, which includes all regular sanity checks
> and correctly handles the FUTEX_OWNER_DIED case.
>
> If it hasn't changed, then return ESRCH as there is no way to distinguish
> this case from malfunctioning user space. This happens when the exiting
> task did not have a robust list, the robust list was corrupted or the user
> space value in the futex was simply bogus.
>
> Reported-by: Stefan Liebler <stli@...ux.ibm.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Cc: Heiko Carstens <heiko.carstens@...ibm.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Darren Hart <dvhart@...radead.org>
> Cc: Ingo Molnar <mingo@...nel.org>
> Cc: stable@...r.kernel.org
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
> ---
> kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 53 insertions(+), 4 deletions(-)
>
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1148,11 +1148,60 @@ static int attach_to_pi_state(u32 __user
> return ret;
> }
>
> +static int handle_exit_race(u32 __user *uaddr, u32 uval, struct task_struct *tsk)
> +{
> + u32 uval2;
> +
> + /*
> + * If PF_EXITPIDONE is not yet set try again.
> + */
> + if (!(tsk->flags & PF_EXITPIDONE))
> + return -EAGAIN;
> +
> + /*
> + * Reread the user space value to handle the following situation:
> + *
> + * CPU0 CPU1
> + *
> + * sys_exit() sys_futex()
> + * do_exit() futex_lock_pi()
> + * exit_signals(tsk) No waiters:
> + * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
> + * mm_release(tsk) Set waiter bit
> + * exit_robust_list(tsk) { *uaddr = 0x80000PID;
> + * Set owner died attach_to_pi_owner() {
> + * *uaddr = 0xC0000000; tsk = get_task(PID);
> + * } if (!tsk->flags & PF_EXITING) {
> + * ... attach();
> + * tsk->flags |= PF_EXITPIDONE; } else {
> + * if (!(tsk->flags & PF_EXITPIDONE))
> + * return -EAGAIN;
> + * return -ESRCH; <--- FAIL
> + * }
> + *
> + * Returning ESRCH unconditionally is wrong here because the
> + * user space value has been changed by the exiting task.
> + */
> + if (get_futex_value_locked(&uval2, uaddr))
> + return -EFAULT;
> +
> + /* If the user space value has changed, try again. */
> + if (uval2 != uval)
> + return -EAGAIN;
> +
> + /*
> + * The exiting task did not have a robust list, the robust list was
> + * corrupted or the user space value in *uaddr is simply bogus.
> + * Give up and tell user space.
> + */
> + return -ESRCH;
> +}
> +
> /*
> * Lookup the task for the TID provided from user space and attach to
> * it after doing proper sanity checks.
> */
> -static int attach_to_pi_owner(u32 uval, union futex_key *key,
> +static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
> struct futex_pi_state **ps)
> {
> pid_t pid = uval & FUTEX_TID_MASK;
> @@ -1187,7 +1236,7 @@ static int attach_to_pi_owner(u32 uval,
> * set, we know that the task has finished the
> * cleanup:
> */
> - int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
> + int ret = handle_exit_race(uaddr, uval, p);
>
> raw_spin_unlock_irq(&p->pi_lock);
> put_task_struct(p);
> @@ -1244,7 +1293,7 @@ static int lookup_pi_state(u32 __user *u
> * We are the first waiter - try to look up the owner based on
> * @uval and attach to it.
> */
> - return attach_to_pi_owner(uval, key, ps);
> + return attach_to_pi_owner(uaddr, uval, key, ps);
> }
>
> static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
> @@ -1352,7 +1401,7 @@ static int futex_lock_pi_atomic(u32 __us
> * attach to the owner. If that fails, no harm done, we only
> * set the FUTEX_WAITERS bit in the user space variable.
> */
> - return attach_to_pi_owner(uval, key, ps);
> + return attach_to_pi_owner(uaddr, uval, key, ps);
> }
>
> /**
>
>
Powered by blists - more mailing lists