linux-kernel - Re: [PATCH v2] wait: add comment before waitqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 23 Oct 2015 14:40:06 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Kosuke Tatsukawa <tatsu@...jp.nec.com>
Cc:	Ingo Molnar <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] wait: add comment before waitqueue_active noting
 memory barrier is required

On Thu, Oct 22, 2015 at 11:18:33PM +0000, Kosuke Tatsukawa wrote:
> Peter Zijlstra wrote:
> > On Thu, Oct 22, 2015 at 08:01:37AM +0000, Kosuke Tatsukawa wrote:
> > 
> > Its somewhat unfortunate you chose the whole wait_woken() thing, its
> > 'rare'.
> 
> Yes.  I first noticed this lack of memory barrier before
> waitqueue_active() issue in drivers/tty/n_tty.c which was using
> wait_woken().  However, other places were mostly using prepare_to_wait()
> or wait_event*(), so wait_woken() is 'rare'.

Which I no doubt introduced there (the wait_woken thing), and it would
have been nice if I'd been Cc to that discussion.

In any case, I found the patch in next and dropping the
waitqueue_active() think is in deed the sane solution. It will serialize
everything on the queue lock.

> >> Second, on the waiting thread side, the CPU can reorder the load of
> >> CONDITION to occur during add_wait_queue active, before the entry is
> >> added to the wait queue.
> >>      wake_up thread                 waiting thread
> >>                                       (reordered)
> >> ------------------------------------------------------------------------
> >>                                 spin_lock_irqsave(...)      <add_wait_queue>
> >>                                 if (CONDITION)
> >> CONDITION = 1;
> >> if (waitqueue_active(wq))
> > 	wake_up();
> >>                                 __add_wait_queue(...)       <add_wait_queue>
> >>                                 spin_unlock_irqrestore(...) <add_wait_queue>
> >>                                 wait_woken(&wait, ...);
> >> ------------------------------------------------------------------------
> > 
> > This isn't actually a problem IIRC, because wait_woken() will test
> > WQ_FLAG_WOKEN and not actually sleep.
> 
> In the above figure, waitqueue_active(wq) will return 0 (queue is
> inactive) and skip the whole wake_up() call, because __add_wait_queue()
> hasn't been called yet.  This actually does occur using a reproducer.

Duh, indeed.

> > Does that work for you?
> 
> Yes.  Considering that the use of wait_woken is pretty rare, I think the
> explanation is more focused and easier to understand this way.

OK, thanks, I'll queue the below.

---
Subject: sched, wait: Document waitqueue_active
From: Peter Zijlstra <peterz@...radead.org>
Date: Fri Oct 23 14:32:34 CEST 2015

Kosuku reports that there were a fair number of buggy
waitqueue_active() users and this function deserves a big comment in
order to avoid growing more.

Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Reported-by: Kosuke Tatsukawa <tatsu@...jp.nec.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
 include/linux/wait.h |   30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -102,6 +102,36 @@ init_waitqueue_func_entry(wait_queue_t *
 	q->func		= func;
 }
 
+/**
+ * waitqueue_active -- locklessly test for waiters on the queue
+ * @q: the waitqueue to test for waiters
+ *
+ * returns true if the wait list is not empty
+ *
+ * NOTE: this function is lockless and requires care, incorrect usage _will_
+ * lead to sporadic and non-obvious failure.
+ *
+ * Use either while holding wait_queue_head_t::lock or when used for wakeups
+ * with an extra smp_mb() like:
+ *
+ *      CPU0 - waker                    CPU1 - waiter
+ *
+ *                                      for (;;) {
+ *      @cond = true;                     prepare_to_wait(&wq, &wait, state);
+ *      smp_mb();                         // smp_mb() from set_current_state()
+ *      if (waitqueue_active(wq))         if (@cond)
+ *        wake_up(wq);                      break;
+ *                                        schedule();
+ *                                      }
+ *                                      finish_wait(&wq, &wait);
+ *
+ * Because without the explicit smp_mb() it's possible for the
+ * waitqueue_active() load to get hoisted over the @cond store such that we'll
+ * observe an empty wait list while the waiter might not observe @cond.
+ *
+ * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
+ * which (when the lock is uncontended) are of roughly equal cost.
+ */
 static inline int waitqueue_active(wait_queue_head_t *q)
 {
 	return !list_empty(&q->task_list);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/