lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 23 Nov 2013 21:15:35 -0800
From:	Davidlohr Bueso <davidlohr@...com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Darren Hart <dvhart@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Mike Galbraith <efault@....de>, Jeff Mahoney <jeffm@...e.com>,
	"Norton, Scott J" <scott.norton@...com>, tom.vaden@...com,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	Waiman Long <Waiman.Long@...com>,
	Jason Low <jason.low2@...com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

On Sat, 2013-11-23 at 19:46 -0800, Linus Torvalds wrote:
> On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > Now the question is why we queue the waiter _AFTER_ reading the user
> > space value. The comment in the code is pretty non sensical:
> >
> >    * On the other hand, we insert q and release the hash-bucket only
> >    * after testing *uaddr.  This guarantees that futex_wait() will NOT
> >    * absorb a wakeup if *uaddr does not match the desired values
> >    * while the syscall executes.
> >
> > There is no reason why we cannot queue _BEFORE_ reading the user space
> > value. We just have to dequeue in all the error handling cases, but
> > for the fast path it does not matter at all.
> >
> > CPU 0                                   CPU 1
> >
> >     val = *futex;
> >     futex_wait(futex, val);
> >
> >     spin_lock(&hb->lock);
> >
> >     plist_add(hb, self);
> >     smp_wmb();
> >
> >     uval = *futex;
> >                                         *futex = newval;
> >                                         futex_wake();
> >
> >                                         smp_rmb();
> >                                         if (plist_empty(hb))
> >                                            return;
> > ...
> 
> This would seem to be a nicer approach indeed, without needing the
> extra atomics.

Yep, I think we can all agree that doing this optization without atomic
ops is a big plus.

> 
> Davidlohr, mind trying Thomas' approach?

I just took a quick look and it seems pretty straightforward, but not
without some details to consider. We basically have to redo/reorder
futex_wait_setup(), which checks that uval == val, and
futex_wait_queue_me(), which adds the task to the list and blocks. Now,
both futex_wait() and futex_wait_requeue_pi() have this logic, but since
we don't use futex_wake() to wakeup tasks on pi futex_qs, I believe it's
ok to only change futex_wait(), while the order of the uval checking
doesn't matter for futex_wait_requeue_pi() so it can stay as is.

The following is the general skeleton of what Thomas is probably
referring to (yes, it needs comments, cleanups, refactoring, etc, etc).
Futexes are easy to screw up but at least this passes a kernel build and
all futextests on a large box.

static int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
		      ktime_t *abs_time, u32 bitset)
{
	struct hrtimer_sleeper timeout, *to = NULL;
	struct restart_block *restart;
	struct futex_hash_bucket *hb;
	struct futex_q q = futex_q_init;
	int prio, ret = 0;
	u32 uval;

	if (!bitset)
		return -EINVAL;
	q.bitset = bitset;

	if (abs_time) {
		to = &timeout;

		hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
				      CLOCK_REALTIME : CLOCK_MONOTONIC,
				      HRTIMER_MODE_ABS);
		hrtimer_init_sleeper(to, current);
		hrtimer_set_expires_range_ns(&to->timer, *abs_time,
					     current->timer_slack_ns);
	}
retry:	
	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, VERIFY_READ);
	if (unlikely(ret != 0))
		goto out;

retry_private:
	hb = queue_lock(&q);
	prio = min(current->normal_prio, MAX_RT_PRIO);

	plist_node_init(&q.list, prio);
	plist_add(&q.list, &hb->chain);
	q.task = current;
	/* do NOT drop the lock here */
	smp_wmb();

	ret = get_futex_value_locked(&uval, uaddr);
	if (ret) {
		plist_del(&q.list, &hb->chain);
		spin_unlock(&hb->lock);
		
		ret = get_user(uval, uaddr);
		if (ret)
			goto out_put_key;

		if (!(flags & FLAGS_SHARED))
			goto retry_private;

		put_futex_key(&q.key);
		goto retry;

	}

	if (uval != val) {
		plist_del(&q.list, &hb->chain);
		spin_unlock(&hb->lock);
		ret = -EWOULDBLOCK;
		goto out_put_key;
	}

	set_current_state(TASK_INTERRUPTIBLE);
	spin_unlock(&hb->lock);

	/* Arm the timer */
	if (to) {
		hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS);
		if (!hrtimer_active(&to->timer))
			to->task = NULL;
	}

	/*
	 * If we have been removed from the hash list, then another task
	 * has tried to wake us, and we can skip the call to schedule().
	 */
	if (likely(!plist_node_empty(&q.list))) {
		/*
		 * If the timer has already expired, current will already be
		 * flagged for rescheduling. Only call schedule if there
		 * is no timeout, or if it has yet to expire.
		 */
		if (!to || to->task)
			freezable_schedule();
	}
	__set_current_state(TASK_RUNNING);


	/* If we were woken (and unqueued), we succeeded, whatever. */
	ret = 0;
	/* unqueue_me() drops q.key ref */
	if (!unqueue_me(&q))
		goto out;
	ret = -ETIMEDOUT;
	if (to && !to->task)
		goto out;

	/*
	 * We expect signal_pending(current), but we might be the
	 * victim of a spurious wakeup as well.
	 */
	if (!signal_pending(current))
		goto retry;

	ret = -ERESTARTSYS;
	if (!abs_time)
		goto out;

	restart = &current_thread_info()->restart_block;
	restart->fn = futex_wait_restart;
	restart->futex.uaddr = uaddr;
	restart->futex.val = val;
	restart->futex.time = abs_time->tv64;
	restart->futex.bitset = bitset;
	restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;

	ret = -ERESTART_RESTARTBLOCK;

out_put_key:
	if (ret)
		put_futex_key(&q.key);
out:
	if (to) {
		hrtimer_cancel(&to->timer);
		destroy_hrtimer_on_stack(&to->timer);
	}
	return ret;
}

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ