linux-kernel - Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.1007122226230.3321@localhost.localdomain>
Date:	Mon, 12 Jul 2010 22:40:18 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Darren Hart <dvhltc@...ibm.com>
cc:	Mike Galbraith <efault@....de>, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Eric Dumazet <eric.dumazet@...il.com>,
	John Kacur <jkacur@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	linux-rt-users@...r.kernel.org
Subject: Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t

On Mon, 12 Jul 2010, Darren Hart wrote:
> On 07/10/2010 12:41 PM, Mike Galbraith wrote:
> > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote:
> > > > Out of curiosity, what's wrong with holding his pi_lock across the
> > > > wakeup?  He can _try_ to block, but can't until pi state is stable.
> > > > 
> > > > I presume there's a big fat gotcha that's just not obvious to futex
> > > > locking newbie :)
> 
> Nor to some of us that have been engrossed in futexes for the last couple
> years! I discussed the pi_lock across the wakeup issue with Thomas. While this
> fixes the problem for this particular failure case, it doesn't protect
> against:
> 
> <tglx> assume the following:
> <tglx> t1 is on the condvar
> <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex
> <tglx> t3 takes hb->lock for a futex in the same bucket
> <tglx> t2 wakes due to signal/timeout
> <tglx> t2 blocks on hb->lock
> 
> You are likely to have not hit the above scenario because you only had one
> condvar, so the hash_buckets were not heavily shared and you weren't likely to
> hit:
> 
> <tglx> t3 takes hb->lock for a futex in the same bucket
> 
> 
> I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it
> through some tests. I'd still prefer a way to detect early wakeup without
> having to grab the hb->lock(), but I haven't found it yet.
> 
> +	while(!spin_trylock(&hb->lock))
> +		cpu_relax();
>  	ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
>  	spin_unlock(&hb->lock);

And this is nasty as it will create unbound priority inversion :(

We discussed another solution on IRC in meantime:

in futex_wait_requeue_pi()

   futex_wait_queue_me(hb, &q, to);

   raw_spin_lock(current->pi_lock);
   if (current->pi_blocked_on) {
      /*
       * We know that we can only be blocked on the outer futex
       * so we can skip the early wakeup check
       */
       raw_spin_unlock(current->pi_lock);
       ret = 0;
   } else {
      current->pi_blocked_on = PI_WAKEUP_INPROGRESS;
      raw_spin_unlock(current->pi_lock);

      spin_lock(&hb->lock);
      ret = handle_early_requeue_pi_wakeup();
      ....
      spin_lock(&hb->lock);
   }

Now in the rtmutex magic we need in task_blocks_on_rt_mutex():

   raw_spin_lock(task->pi_lock);

   /*
    * Add big fat comment why this is only relevant to futex
    * requeue_pi
    */

   if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) {
      raw_spin_lock(task->pi_lock);

      /*
       * Returning 0 here is fine. the requeue code is just going to
       * move the futex_q to the other bucket, but that'll be fixed
       * up in handle_early_requeue_pi_wakeup()
       */

      return 0;
   }

Thanks,

	tglx

    
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/