linux-kernel - Re: wake_up_process implied memory barrier clarification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150829142707.GA19263@redhat.com>
Date:	Sat, 29 Aug 2015 16:27:07 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Boqun Feng <boqun.feng@...il.com>
Cc:	Michal Hocko <mhocko@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	David Howells <dhowells@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Jonathan Corbet <corbet@....net>
Subject: Re: wake_up_process implied memory barrier clarification

Hello Boqun,

On 08/29, Boqun Feng wrote:
> Hi Oleg
>
> On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
> > On 08/28, Michal Hocko wrote:
> > >
> > > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > > > On 08/27, Michal Hocko wrote:
> > > > >
> > > > > --- a/Documentation/memory-barriers.txt
> > > > > +++ b/Documentation/memory-barriers.txt
> > > > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state is cleared, and so sits
> > > > >  	    <general barrier>		  STORE current->state
> > > > >  	LOAD event_indicated
> > > > >
> > > > > +Please note that wake_up_process is an exception here because it implies
> > > > > +the write memory barrier unconditionally.
> > > > > +
> > > >
> > > > I simply can't understand (can't even parse) this part of memory-barriers.txt.
> > >
> > > Do you mean the added text or the example above it?
> >
> > Both ;)
> >
> > but note that "load from X might see 0" is true of course, and in this
>
> By this, I think you actually means the example below the added text,
> i.e. the example for "to repeat..", right?

And above. Even

	The barrier occurs before the task state is cleared

is not actually right. This is misleading. What is really important is that
we have a barrier before we _read_ the task state. And again, again, the
fact that we actually have the write barrier is just the implementation
detail.

> Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()
>
> The memory barriers in some sleep and wakeup functions are conditional,
> there are several situations that there is no barriers:
>
> 1.	If wait_event() and co. actually don't prepare to sleep, there
> 	may be no barrier in them.

And thus (imo) we should not even try to document this. I mean, a user
should not make any assumptions about the barriers inside wait_event().

> 2.	No matter whether a sleep occurs or not, there may be no barrier
> 	between a successful wait-condition checking(the result of which
> 	is true) in wait_event() and the following instructions.

Yes, this is true in any case. Not sure this deserves addtionional
documentation, but see below.

> 3.	If wake_up() and co. actually wake up no one, there may be no
> 	write barrier in them.

See above.

> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
>
>  which therefore also imply a general memory barrier after setting the state.
>  The whole sequence above is available in various canned forms, all of which
> -interpolate the memory barrier in the right place:
> +imply a general barrier if and only if a sleep is at least about to happen,
> +i.e. prepare_to_wait*() is called.
>
>  	wait_event();
>  	wait_event_interruptible();
> @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
>  	wait_on_bit();
>  	wait_on_bit_lock();
>
> +Further more, no barrier is guaranteed after the successful wait condition
> +checkings, whose results are true, in wait_*() and before the instructions
> +following wait_*().

Yes, perhaps this makes sense, but (to me) only because the explanation above
looks a bit confusing to me. I simply can't understand why memory-barriers.txt
mentions that barrier implied by set_current_state(). You need to know this
if you want to understand how wait_event() works. You do not need to know
about this barrier if you want to use wait_event(). If nothing else, because
you can never rely on this barrier. Anf your examples below shows this.

> +To repeat, barriers in wait_event(), wake_up() and co. are conditional, meaning
> +they are present if and only if a sleep or a wakeup actually occurs. To see
> +this, consider the following three examples.
> +
> +The first example is for the conditional barriers in wait_*(), say X and Y
> +are both initially zero:
> +
> +	CPU 1				CPU 2
> +	===============================	===============================
> +					X = 1;
> +					smp_mb();
> +	wait_event(wq, Y == 1);		Y = 1;
> +	  load from Y sees 1		wake_up();
> +	  <no memory barrier>
> +	load from X might see 0
> +
> +And even if a sleep and a wakeup really occurs, there might be no barrier
> +between the last load from Y(which the makes wait condition checking succeed)
> +and load from X, so that load from X might still see 0.
...

> +So that, to guarantee that CPU 1's load from X is 1 in the two examples above,
> +a read barrier must be added after wait_event() in the code running on CPU 1.

Yes. So why should we try to document that barrier in wait_event() ?

> +The third example is for the conditional write barriers in wake_up*(), say
> +X, Y and Z are all initially zero:
> +
> +	CPU 1				CPU 2
> +	===============================	===============================
> +					X = 1;
> +	wait_event(wq, Y == 1);		Y = 1;
> +	  load from Y sees 1		wake_up();
> +					  <no memory barrier>
> +					Z = 1;
> +	load from Z sees 1
> +	smp_rmb();
> +	load from X might see 0
> +
> +In contrast, if a wakeup does occur, CPU 1's load from X would be guaranteed
> +to see 1. To guarantee that CPU 1's load from X is 1, a write barrier must be
> +added between store to X and store to Z in the code running on CPU 2.

The same. Why should we mention that barrier in wake_up() if we can't
rely on it and CPU 2 needs a write barrier anyway?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/