lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1512191615340.28591@nanos>
Date:	Sat, 19 Dec 2015 19:24:38 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Davidlohr Bueso <dave@...olabs.net>
cc:	Bhuvanesh_Surachari@...tor.com,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux@...musvillemoes.dk, viresh.kumar@...aro.org,
	luto@...capital.net, Sebastian Sewior <bigeasy@...utronix.de>,
	mtk.manpages@...il.com, LKML <linux-kernel@...r.kernel.org>,
	Andy Lowe <Andy_Lowe@...tor.com>,
	Darren Hart <dvhart@...radead.org>
Subject: Re: [PATCH] futex: Prevent pi_state from double freeing in case of
 error

On Fri, 18 Dec 2015, Davidlohr Bueso wrote:
> On Fri, 18 Dec 2015, Bhuvanesh_Surachari@...tor.com wrote:
> 
> > From: Bhuvanesh Surachari <bhuvanesh_surachari@...tor.com>
> > 
> > In case of error from rt_mutex_start_proxy_lock pi_state is freed
> > twice in futex_requeue function. Hence removing free_pi_state in
> > else branch and branching to the location where pi_state is freed.

This is wrong. See below.
 
> This reads weird.
> 
> Do note that free_pi_state is already protected against passing it a nil
> pi_state, so this is merely cosmetic.

No, it's not cosmetic. pi_state is not set to NULL - this->pi_state is.

> Hmm but yeah, looks legit. The cases were we free the pi_state on error
> are when doing retry kind of paths, EDEADLK not being one of those cases.

It does not look legit. It is simply wrong.

> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 684d754..264b3f2 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -1815,7 +1815,6 @@ retry_private:
> > 			} else if (ret) {
> > 				/* -EDEADLK */
> > 				this->pi_state = NULL;
> > -				free_pi_state(pi_state);
> > 				goto out_unlock;
> > 			}
> > 		}
> 
> 
> Nit but I think we also want to set pi_state to nil after freeing it
> in out_unlock. That way this scenario would simply be goto out_unlock.

No. In the normal exit case we only drop the extra reference on the
pi_state, but we CANNOT clear this->pi_state. That would be fatal as
we just queued a waiter which relies on that state pointer ....

Clearing pi_state itself is pointless as it's a local variable and
ceases to exist because we return.

Let me explain, why the patch is wrong and why some more things are
buggy and suboptimal here with that refcount stuff:

The first thing after we checked the uaddr1 value is:

    if (requeue_pi && (task_count - nr_wake < nr_requeue)) {

       ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
       	     				&key2, &pi_state, nr_requeue);

       Let's look at the return values and the pi_state

       0:        
	  The user space futex @uaddr2 is locked and we either
	  attached to existing pi_state or created new state.

  	  In both cases we hold a reference count on the state, but
  	  the waiter is still enqueued in hb1.

       <0:

	  Any kind of error happened. pi_state is NULL

       >0:

          The user space futex take over was successfull. pi_state is
          NULL. The return value is the vpid of the top waiter,
          i.e. the task on which behalf we acquired the futex.

	  Waiter is dequeued from hb1 and woken up.

	  So in this case we call

	  ret = lookup_pi_state(ret, hb2, &key2, &pi_state);

	  Here the return value means:

	    0:
	        pi_state is !NULL and a reference count on the state is
	       	held

	    <0:
	        Any kind of error happened. pi_state is NULL

  Now we check the return value of the above:

     switch (ret) {
     case 0:
		break;
	 Fine...

     case -EFAULT:
     	  free_pi_state(pi_state);
	  pi_state = NULL;
	  
	  That's pointless and confusing because pi_state IS NULL. If
	  it is not NULL then either futex_proxy_trylock_atomic() or
	  lookup_pi_state() are completely buggered.

     Ditto for -EAGAIN

     For the default case we goto out_unlock, which also calls
     free_pi_state(pi_state), but that's not a real issue as it is
     NULL safe.

If everything went fine, then we enter 

     plist_for_each_entry_safe(this, next, &hb1->chain, list) {

     And in the requeue_pi case we do:
     
      if (requeue_pi) {

      	       atomic_inc(&pi_state->refcount);

      So here we take another refcount on the state, i.e. one for each
      waiter. And we store the pointer in the waiters futex_q object:

	       this->pi_state = pi_state;
	       
	       ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
					       this->rt_waiter,
					       this->task);
	       if (ret == 1) {

      We locked the rt mutex on behalf of this waiter. We do neither
      clear this->pi_state nor drop the refcount because the waiter
      needs the pi_state for fixing up the futex user space value.

      Now the waiter should drop the reference, but that's not the
      case. So this is a real issue here which needs to be
      fixed. Patch comes in seperate mail.

	      } else if (ret) {
	      	     free_pi_state(pi_state);
		     this->pi_state = NULL;

      This undoes the above atomic_inc() and clears the pi_state
      reference in the waiter futex_q object.

      So if we remove this we leak a reference count and have a
      dangling pointer in the waiters futex_q object .... Not so
      desired, right?

		     goto out_unlock;
              }
       }
       ....

out_unlock:
	free_pi_state(pi_state);

  Here we drop the extra reference to the pi_state which we
  acquired at the beginning of the requeue_pi code before we
  started to requeue waiters.


So the patch and the reviews of it are plain wrong.

This refcount stuff certainly lacks a few comments, but I really would
like to know:

 - Why was this patch created in the first place?

   The changelog is completely useless. It does not tell what the
   observed issue was or whether this was merily the result of reading
   the code and assuming that this is a double free. But it was
   certainly not due to a thorough analysis of the code in question.

 - Why are the reviews so sloppy and useless?

   The one I'm answering to is just hillarious and the other one picks
   a random commit, claims that it inadvertantly introduced the issue
   and is done with it. Really helpful.

Folks, this is not the way it works. It took me a couple of hours to
get this analyzed properly, but I don't see that anyone involved in
this thread has spent more than a few seconds on it.

At least I found a real issue related to this refcount stuff, but I
would have preferred that the people involved in this would have found
it in the first place instead of providing half baken crap.

Yours grumpy

      tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ