linux-kernel - Re: Memory barrier needed with wake_up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160906122037.GL10168@twins.programming.kicks-ass.net>
Date:   Tue, 6 Sep 2016 14:20:37 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Felipe Balbi <felipe.balbi@...ux.intel.com>
Cc:     Alan Stern <stern@...land.harvard.edu>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Ingo Molnar <mingo@...hat.com>,
        USB list <linux-usb@...r.kernel.org>,
        Kernel development list <linux-kernel@...r.kernel.org>,
        Will Deacon <will.deacon@....com>
Subject: Re: Memory barrier needed with wake_up_process()?

On Tue, Sep 06, 2016 at 01:49:37PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 06, 2016 at 02:43:39PM +0300, Felipe Balbi wrote:

> > My fear now, however, is that changing smp_[rw]mb() to smp_mb() just
> > adds extra overhead which makes the problem much, much less likely to
> > happen. Does that sound plausible to you?
> 
> I did consider that, but I've not sufficiently grokked the code to rule
> out actual fail. So let me stare at this a bit more.

OK, so I'm really not seeing it, we've got:

while (bh->state != FULL) {
        for (;;) {
                set_current_state(INTERRUPTIBLE); /* MB after */
                if (signal_pending(current))
                        return -EINTR;
                if (common->thread_wakeup_needed)
                        break;
                schedule(); /* MB */
        }
        __set_current_state(RUNNING);
        common->thread_wakeup_needed = 0;
        smp_rmb(); /* NOP */
}

VS.

spin_lock(&common->lock); /* MB */
bh->state = FULL;
smp_wmb(); /* NOP */
common->thread_wakeup_needed = 1;
wake_up_process(common->thread_task); /* MB before */
spin_unlock(&common->lock);

(the MB annotations specific to x86, not true in general)

If we observe thread_wakeup_needed, we must also observe bh->state.

And the sleep/wakeup ordering is also correct, we either see
thread_wakeup_needed and continue, or we see task->state == RUNNING
(from the wakeup) and NO-OP schedule(). The MB from set_current_statE()
then matches with the MB from wake_up_process() to ensure we must see
thead_wakeup_needed.

Or, we go sleep, and get woken up, at which point the same happens.
Since the waking CPU gets the task back on its RQ the happens-before
chain includes the waking CPUs state along with the state of the task
itself before it went to sleep.

At which point we're back where we started, once we see
thread_wakeup_needed we must then also see bh->state (and all state
prior to that on the waking CPU).

There's enough cruft in the while-sleep loop to force reload bh->state.

Load/store tearing cannot be a problem because all values are single
bytes (the variables are multi bytes, but all values used only affect
the LSB).

Colour me puzzled.