lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1402638363.22229.23.camel@fury.dvhart.com>
Date:	Thu, 12 Jun 2014 22:46:03 -0700
From:	Darren Hart <darren@...art.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Davidlohr Bueso <davidlohr@...com>,
	Kees Cook <kees@...flux.net>, wad@...omium.org
Subject: Re: [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it
 more robust

On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> futex_lock_pi_atomic() is a maze of retry hoops and loops.
> 
> Reduce it to simple and understandable states:

Heh... well...

With this patch applied (1-4 will not reproduce without 5), if userspace
wrongly sets the uval to 0, the pi_state can end up being NULL for a
subsequent requeue_pi operation:

[   10.426159] requeue: 00000000006022e0 to 00000000006022e4
[   10.427737]   this:ffff88013a637da8
[   10.428749]   waking:ffff88013a637da8
fut2 = 0
[   10.429994]   comparing requeue_pi_key
[   10.431034]   prepare waiter to take the rt_mutex
[   10.432344]   pi_state:          (null)
[   10.433414] BUG: unable to handle kernel NULL pointer dereference at
0000000000000038

This occurs in the requeue loop, in the requeue_pi block at:

	atomic_inc(&pi_state->refcount);


> 
> First step is to lookup existing waiters (state) in the kernel.
> 
> If there is an existing waiter, validate it and attach to it.
> 
> If there is no existing waiter, check the user space value
> 
> If the TID encoded in the user space value is 0, take over the futex
> preserving the owner died bit.

This is the scenario resulting in the panic. See below.

> 
> If the TID encoded in the user space value is != 0, lookup the owner
> task, validate it and attach to it.
> 
> Reduces text size by 144 bytes on x8664.
> 
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
>  kernel/futex.c |  139 +++++++++++++++++++++------------------------------------
>  1 file changed, 53 insertions(+), 86 deletions(-)
> 
> Index: linux/kernel/futex.c
> ===================================================================
> --- linux.orig/kernel/futex.c
> +++ linux/kernel/futex.c
> @@ -956,6 +956,17 @@ static int lookup_pi_state(u32 uval, str
>         return attach_to_pi_owner(uval, key, ps);
>  }
>  
> +static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
> +{
> +       u32 uninitialized_var(curval);
> +
> +       if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
> +               return -EFAULT;
> +
> +       /* If user space value changed, let the caller retry */
> +       return curval != uval ? -EAGAIN : 0;
> +}
> +
>  /**
>   * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
>   * @uaddr:             the pi futex user address
> @@ -979,113 +990,67 @@ static int futex_lock_pi_atomic(u32 __us
>                                 struct futex_pi_state **ps,
>                                 struct task_struct *task, int set_waiters)
>  {
> -       int lock_taken, ret, force_take = 0;
> -       u32 uval, newval, curval, vpid = task_pid_vnr(task);
> -
> -retry:
> -       ret = lock_taken = 0;
> +       u32 uval, newval, vpid = task_pid_vnr(task);
> +       struct futex_q *match;
> +       int ret;
>  
>         /*
> -        * To avoid races, we attempt to take the lock here again
> -        * (by doing a 0 -> TID atomic cmpxchg), while holding all
> -        * the locks. It will most likely not succeed.
> +        * Read the user space value first so we can validate a few
> +        * things before proceeding further.
>          */
> -       newval = vpid;
> -       if (set_waiters)
> -               newval |= FUTEX_WAITERS;
> -
> -       if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
> +       if (get_futex_value_locked(&uval, uaddr))
>                 return -EFAULT;
>  
>         /*
>          * Detect deadlocks.
>          */
> -       if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
> +       if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
>                 return -EDEADLK;
>  
>         /*
> -        * Surprise - we got the lock, but we do not trust user space at all.

We really don't want to drop this comment (at leas not the latter
half ;-)

> +        * Lookup existing state first. If it exists, try to attach to
> +        * its pi_state.
>          */
> -       if (unlikely(!curval)) {
> -               /*
> -                * We verify whether there is kernel state for this
> -                * futex. If not, we can safely assume, that the 0 ->
> -                * TID transition is correct. If state exists, we do
> -                * not bother to fixup the user space state as it was
> -                * corrupted already.
> -                */
> -               return futex_top_waiter(hb, key) ? -EINVAL : 1;

On successful acquisition, we used to return 1...

> -       }
> -
> -       uval = curval;
> +       match = futex_top_waiter(hb, key);
> +       if (match)
> +               return attach_to_pi_state(uval, match->pi_state, ps);
>  
>         /*
> -        * Set the FUTEX_WAITERS flag, so the owner will know it has someone
> -        * to wake at the next unlock.
> +        * No waiter and user TID is 0. We are here because the
> +        * waiters or the owner died bit is set or called from
> +        * requeue_cmp_pi or for whatever reason something took the
> +        * syscall.
>          */
> -       newval = curval | FUTEX_WAITERS;
> -
> -       /*
> -        * Should we force take the futex? See below.
> -        */
> -       if (unlikely(force_take)) {
> +       if (!(uval & FUTEX_TID_MASK)) {
>                 /*
> -                * Keep the OWNER_DIED and the WAITERS bit and set the
> -                * new TID value.
> +                * We take over the futex. No other waiters and the user space
> +                * TID is 0. We preserve the owner died bit.
>                  */

Or userspace is screwing with us and set it to 0 for no good reason at
all... so there could still be a waiter queued up from
FUTEX_WAIT_REQUEUE_PI.

> -               newval = (curval & ~FUTEX_TID_MASK) | vpid;
> -               force_take = 0;
> -               lock_taken = 1;
> -       }
> +               newval = uval & FUTEX_OWNER_DIED;
> +               newval |= vpid;
>  
> -       if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
> -               return -EFAULT;
> -       if (unlikely(curval != uval))
> -               goto retry;
> +               /* The futex requeue_pi code can enforce the waiters bit */
> +               if (set_waiters)
> +                       newval |= FUTEX_WAITERS;
> +
> +               return lock_pi_update_atomic(uaddr, uval, newval);

And now we return 0, resulting in futex_proxy_trylock_atomic also
returning 0, but the pi_state is NULL, and as it doesn't return > 0
(vpid), we don't look it up again after atomic acquisition in
futex_requeue().

And BANG.

Consider the following:

>From 3eec2db2654a6cd8dcac3c60acda0f16a3243e29 Mon Sep 17 00:00:00 2001
From: Darren Hart <dvhart@...ux.intel.com>
Date: Thu, 12 Jun 2014 22:41:32 -0700
Subject: [PATCH] futex: Invert the return value of lock_pi_update_atomic

Indicate success to the caller by returning 1 for lock acquired.

Signed-off-by: Darren Hart <dvhart@...ux.intel.com>
---
 kernel/futex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 391df53..82b96a4 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1033,7 +1033,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr,
struct futex_hash_bucket *hb,
 		if (set_waiters)
 			newval |= FUTEX_WAITERS;
 
-		return lock_pi_update_atomic(uaddr, uval, newval);
+		return !lock_pi_update_atomic(uaddr, uval, newval);
 	}
 
 	/*
-- 
2.0.0.rc2




--
Darren

> +       }
>  
>         /*
> -        * We took the lock due to forced take over.
> +        * First waiter. Set the waiters bit before attaching ourself to
> +        * the owner. If owner tries to unlock, it will be forced into
> +        * the kernel and blocked on hb->lock.
>          */
> -       if (unlikely(lock_taken))
> -               return 1;
> -
> +       newval = uval | FUTEX_WAITERS;
> +       ret = lock_pi_update_atomic(uaddr, uval, newval);
> +       if (ret)
> +               return ret;
>         /*
> -        * We dont have the lock. Look up the PI state (or create it if
> -        * we are the first waiter):
> +        * If the update of the user space value succeeded, we try to
> +        * attach to the owner. If that fails, no harm done, we only
> +        * set the FUTEX_WAITERS bit in the user space variable.
>          */
> -       ret = lookup_pi_state(uval, hb, key, ps);
> -
> -       if (unlikely(ret)) {
> -               switch (ret) {
> -               case -ESRCH:
> -                       /*
> -                        * We failed to find an owner for this
> -                        * futex. So we have no pi_state to block
> -                        * on. This can happen in two cases:
> -                        *
> -                        * 1) The owner died
> -                        * 2) A stale FUTEX_WAITERS bit
> -                        *
> -                        * Re-read the futex value.
> -                        */
> -                       if (get_futex_value_locked(&curval, uaddr))
> -                               return -EFAULT;
> -
> -                       /*
> -                        * If the owner died or we have a stale
> -                        * WAITERS bit the owner TID in the user space
> -                        * futex is 0.
> -                        */
> -                       if (!(curval & FUTEX_TID_MASK)) {
> -                               force_take = 1;
> -                               goto retry;
> -                       }
> -               default:
> -                       break;
> -               }
> -       }
> -
> -       return ret;
> +       return attach_to_pi_owner(uval, key, ps);
>  }
>  
>  /**
> @@ -2316,8 +2281,10 @@ retry_private:
>                         goto uaddr_faulted;
>                 case -EAGAIN:
>                         /*
> -                        * Task is exiting and we just wait for the
> -                        * exit to complete.
> +                        * Two reasons for this:
> +                        * - Task is exiting and we just wait for the
> +                        *   exit to complete.
> +                        * - The user space value changed.
>                          */
>                         queue_unlock(hb);
>                         put_futex_key(&q.key);
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ