linux-kernel - Re: [patch] futex: Cure exit race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <06e7e269-6c87-f5f2-9a15-435b0a376105@linux.ibm.com>
Date:   Tue, 11 Dec 2018 09:04:50 +0100
From:   Stefan Liebler <stli@...ux.ibm.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     Heiko Carstens <heiko.carstens@...ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [patch] futex: Cure exit race

Hi Thomas,

does this also handle the ESRCH returned by
attach_to_pi_owner(...)
{...
	if (!pid)
		return -ESRCH;
	p = find_get_task_by_vpid(pid);
	if (!p)
		return -ESRCH;
...

I think pid should never be zero when attach_to_pi_owner is called.
But it can happen that p is null? At least I traced the "return -ESRCH" 
with the 4.17 kernel. Unfortunately both returns were done by the same 
instruction address.

Bye
Stefan

On 12/10/2018 04:23 PM, Thomas Gleixner wrote:
> Stefan reported, that the glibc tst-robustpi4 test case fails
> occasionally. That case creates the following race between
> sys_exit() and sys_futex(LOCK_PI):
> 
>   CPU0				CPU1
> 
>   sys_exit()			sys_futex()
>    do_exit()			 futex_lock_pi()
>     exit_signals(tsk)		  No waiters:
>      tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
>    mm_release(tsk)		  Set waiter bit
>     exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
>        Set owner died		  attach_to_pi_owner() {
>      *uaddr = 0xC0000000;	   tsk = get_task(PID);
>     }				   if (!tsk->flags & PF_EXITING) {
>    ...				     attach();
>    tsk->flags |= PF_EXITPIDONE;	   } else {
> 				     if (!(tsk->flags & PF_EXITPIDONE))
> 				       return -EAGAIN;
> 				     return -ESRCH; <--- FAIL
> 				   }
> 
> ESRCH is returned all the way to user space, which triggers the glibc test
> case assert. Returning ESRCH unconditionally is wrong here because the user
> space value has been changed by the exiting task to 0xC0000000, i.e. the
> FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
> is a valid state and the kernel has to handle it, i.e. taking the futex.
> 
> Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
> is set in the task which owns the futex. If the value has changed, let
> the kernel retry the operation, which includes all regular sanity checks
> and correctly handles the FUTEX_OWNER_DIED case.
> 
> If it hasn't changed, then return ESRCH as there is no way to distinguish
> this case from malfunctioning user space. This happens when the exiting
> task did not have a robust list, the robust list was corrupted or the user
> space value in the futex was simply bogus.
> 
> Reported-by: Stefan Liebler <stli@...ux.ibm.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Cc: Heiko Carstens <heiko.carstens@...ibm.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Darren Hart <dvhart@...radead.org>
> Cc: Ingo Molnar <mingo@...nel.org>
> Cc: stable@...r.kernel.org
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
> ---
>   kernel/futex.c |   57 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 53 insertions(+), 4 deletions(-)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1148,11 +1148,60 @@ static int attach_to_pi_state(u32 __user
>   	return ret;
>   }
>   
> +static int handle_exit_race(u32 __user *uaddr, u32 uval, struct task_struct *tsk)
> +{
> +	u32 uval2;
> +
> +	/*
> +	 * If PF_EXITPIDONE is not yet set try again.
> +	 */
> +	if (!(tsk->flags & PF_EXITPIDONE))
> +		return -EAGAIN;
> +
> +	/*
> +	 * Reread the user space value to handle the following situation:
> +	 *
> +	 * CPU0				CPU1
> +	 *
> +	 * sys_exit()			sys_futex()
> +	 *  do_exit()			 futex_lock_pi()
> +	 *   exit_signals(tsk)		  No waiters:
> +	 *    tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
> +	 *  mm_release(tsk)		  Set waiter bit
> +	 *   exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
> +	 *      Set owner died		  attach_to_pi_owner() {
> +	 *    *uaddr = 0xC0000000;	   tsk = get_task(PID);
> +	 *   }				   if (!tsk->flags & PF_EXITING) {
> +	 *  ...				     attach();
> +	 *  tsk->flags |= PF_EXITPIDONE;   } else {
> +	 *				     if (!(tsk->flags & PF_EXITPIDONE))
> +	 *				       return -EAGAIN;
> +	 *				     return -ESRCH; <--- FAIL
> +	 *				   }
> +	 *
> +	 * Returning ESRCH unconditionally is wrong here because the
> +	 * user space value has been changed by the exiting task.
> +	 */
> +	if (get_futex_value_locked(&uval2, uaddr))
> +		return -EFAULT;
> +
> +	/* If the user space value has changed, try again. */
> +	if (uval2 != uval)
> +		return -EAGAIN;
> +
> +	/*
> +	 * The exiting task did not have a robust list, the robust list was
> +	 * corrupted or the user space value in *uaddr is simply bogus.
> +	 * Give up and tell user space.
> +	 */
> +	return -ESRCH;
> +}
> +
>   /*
>    * Lookup the task for the TID provided from user space and attach to
>    * it after doing proper sanity checks.
>    */
> -static int attach_to_pi_owner(u32 uval, union futex_key *key,
> +static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
>   			      struct futex_pi_state **ps)
>   {
>   	pid_t pid = uval & FUTEX_TID_MASK;
> @@ -1187,7 +1236,7 @@ static int attach_to_pi_owner(u32 uval,
>   		 * set, we know that the task has finished the
>   		 * cleanup:
>   		 */
> -		int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
> +		int ret = handle_exit_race(uaddr, uval, p);
>   
>   		raw_spin_unlock_irq(&p->pi_lock);
>   		put_task_struct(p);
> @@ -1244,7 +1293,7 @@ static int lookup_pi_state(u32 __user *u
>   	 * We are the first waiter - try to look up the owner based on
>   	 * @uval and attach to it.
>   	 */
> -	return attach_to_pi_owner(uval, key, ps);
> +	return attach_to_pi_owner(uaddr, uval, key, ps);
>   }
>   
>   static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
> @@ -1352,7 +1401,7 @@ static int futex_lock_pi_atomic(u32 __us
>   	 * attach to the owner. If that fails, no harm done, we only
>   	 * set the FUTEX_WAITERS bit in the user space variable.
>   	 */
> -	return attach_to_pi_owner(uval, key, ps);
> +	return attach_to_pi_owner(uaddr, uval, key, ps);
>   }
>   
>   /**
> 
>