linux-kernel - Re: wake_up_process implied memory barrier clarification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150901145024.GA8007@fixme-laptop.cn.ibm.com>
Date:	Tue, 1 Sep 2015 22:50:24 +0800
From:	Boqun Feng <boqun.feng@...il.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Michal Hocko <mhocko@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	David Howells <dhowells@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jonathan Corbet <corbet@....net>
Subject: Re: wake_up_process implied memory barrier clarification

On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> On 09/01, Boqun Feng wrote:
> >
> > But I'm still a little confused at Oleg's words:
> >
> > "What is really important is that we have a barrier before we _read_ the
> > task state."
> >
> > I read is as "What is really important is that we have a barrier before
> > we _read_ the task state and _after_ we write the CONDITION", if I don't
> > misunderstand Oleg, this means a STORE-barrier-LOAD sequence,
> 
> Yes, exactly.
> 
> Let's look at this trivial code again,
> 
> 	CONDITION = 1;
> 	wake_up_process();
> 
> note that try_to_wake_up() does
> 
> 	if (!(p->state & state))
> 		goto out;
> 
> If this LOAD could be reordered with STORE(CONDITION) above we can obviously
> race with
> 
> 	set_current_state(...);
> 	if (!CONDITION)
> 		schedule();
> 
> See the comment at the start of try_to_wake_up(). And again, again, please

Thank you for your detailed explanation!

I read the comment, but just couldn't understand how the pairing
happens, I think I have a misunderstanding of pairing, please see below.

> note that initially the only documented behaviour of smp_mb__before_spinlock()
> was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it
> doesn't actually need the write barrier after STORE(CONDITION).
> 
> And just in case, wake_up() differs in a sense that it doesn't even need
> that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> wait_event()-like code.
> 
> > which IIUC
> > can't pair with anything.
> 
> It pairs with the barrier implied by set_current_state().

I think maybe I have a misunderstanding of barrier pairing. I used to
think that a barrier pairing can only happen:

1.	between a "MEM-barrier-STORE" and "LOAD-barrier-MEM", when the
	LOAD reads the value which the STORE writes

where MEM is a memory operation(STORE or LOAD).

However, wait and wakeup don't fit in the #1 barrier pairing, so I guess
I was wrong, there is a kind of barrier pairings which also happen:

2.	between a "MEM-barrier-LOAD" and "STORE-barrier-MEM", when the
	LOAD _fails_ to read the value which the STORE writes,

#1 pairing is easy to understand and memory-barriers.txt already has
some examples. I admit that I haven't seen any usage of #2 pairing
other than wait_event() and wake_up(). Of course, this may be because
I'm not an expert of parallel programming and don't know too much..

Have to ask Paul, when we are talking about barrier pairing, we are
actually talking about the combination of #1 and #2, right?

And Oleg, the barrier pairing we are talking here is the kind of #2,
right?

Add two examples, in case that I fail to make clear the difference of
two barrier pairings.

One example of #1 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1				CPU 2
===========================	================================
WRITE_ONCE(Y, 1);
smp_mb();
WRITE_ONCE(X, 1);
				r1 = READ_ONCE(X); // r1 == 1
				smp_mb();
				r2 = READ_ONCE(Y);
----------------------------------------------------------------
{ assert(!(r1 == 1 && r2 == 0)); // means if r1 == 1 then r2 == 1}

If CPU 2 reads the value of X written by CPU 1, r2 is guaranteed to be
1, which means assert(!(r1 == 1 && r2 == 0)) afterwards wouldn't be
triggered in any case.

One example of #2 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1				CPU 2
===========================	================================
WRITE_ONCE(Y, 1);
smp_mb();
r1 = READ_ONCE(X); // r1 == 0
				WRITE_ONCE(X, 1);
				smp_mb();
				r2 = READ_ONCE(Y);

----------------------------------------------------------------
{ assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}

If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
wouldn't be triggered in any case.

And this is actually the case of wake_up/wait, assuming that
prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2, X
is the condition and Y is the task state, and replace smp_mb() with
really necessary barriers, right?

Regards,
Boqun

Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)