linux-kernel - Re: [RFC][PATCH 0/5] arch: atomic rework

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1391721423.23421.3898.camel@triegel.csb>
Date:	Thu, 06 Feb 2014 22:17:03 +0100
From:	Torvald Riegel <triegel@...hat.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Will Deacon <will.deacon@....com>,
	Ramana Radhakrishnan <Ramana.Radhakrishnan@....com>,
	David Howells <dhowells@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"mingo@...nel.org" <mingo@...nel.org>,
	"gcc@....gnu.org" <gcc@....gnu.org>
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework

On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote:
> On Thu, Feb 06, 2014 at 06:59:10PM +0000, Will Deacon wrote:
> > On Thu, Feb 06, 2014 at 06:55:01PM +0000, Ramana Radhakrishnan wrote:
> > > On 02/06/14 18:25, David Howells wrote:
> > > >
> > > > Is it worth considering a move towards using C11 atomics and barriers and
> > > > compiler intrinsics inside the kernel?  The compiler _ought_ to be able to do
> > > > these.
> > > 
> > > 
> > > It sounds interesting to me, if we can make it work properly and 
> > > reliably. + gcc@....gnu.org for others in the GCC community to chip in.
> > 
> > Given my (albeit limited) experience playing with the C11 spec and GCC, I
> > really think this is a bad idea for the kernel. It seems that nobody really
> > agrees on exactly how the C11 atomics map to real architectural
> > instructions on anything but the trivial architectures. For example, should
> > the following code fire the assert?
> > 
> > 
> > extern atomic<int> foo, bar, baz;
> > 
> > void thread1(void)
> > {
> > 	foo.store(42, memory_order_relaxed);
> > 	bar.fetch_add(1, memory_order_seq_cst);
> > 	baz.store(42, memory_order_relaxed);
> > }
> > 
> > void thread2(void)
> > {
> > 	while (baz.load(memory_order_seq_cst) != 42) {
> > 		/* do nothing */
> > 	}
> > 
> > 	assert(foo.load(memory_order_seq_cst) == 42);
> > }
> > 
> > 
> > To answer that question, you need to go and look at the definitions of
> > synchronises-with, happens-before, dependency_ordered_before and a whole
> > pile of vaguely written waffle to realise that you don't know. Certainly,
> > the code that arm64 GCC currently spits out would allow the assertion to fire
> > on some microarchitectures.
> 
> Yep!  I believe that a memory_order_seq_cst fence in combination with the
> fetch_add() would do the trick on many architectures, however.  All of
> this is one reason that any C11 definitions need to be individually
> overridable by individual architectures.

"Overridable" in which sense?  Do you want to change the semantics on
the language level in the sense of altering the memory model, or rather
use a different implementation under the hood to, for example, fix
deficiencies in the compilers?

> > There are also so many ways to blow your head off it's untrue. For example,
> > cmpxchg takes a separate memory model parameter for failure and success, but
> > then there are restrictions on the sets you can use for each. It's not hard
> > to find well-known memory-ordering experts shouting "Just use
> > memory_model_seq_cst for everything, it's too hard otherwise". Then there's
> > the fun of load-consume vs load-acquire (arm64 GCC completely ignores consume
> > atm and optimises all of the data dependencies away) as well as the definition
> > of "data races", which seem to be used as an excuse to miscompile a program
> > at the earliest opportunity.
> 
> Trust me, rcu_dereference() is not going to be defined in terms of
> memory_order_consume until the compilers implement it both correctly and
> efficiently.  They are not there yet, and there is currently no shortage
> of compiler writers who would prefer to ignore memory_order_consume.

Do you have any input on
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448?  In particular, the
language standard's definition of dependencies?

> And rcu_dereference() will need per-arch overrides for some time during
> any transition to memory_order_consume.
> 
> > Trying to introduce system concepts (writes to devices, interrupts,
> > non-coherent agents) into this mess is going to be an uphill battle IMHO. I'd
> > just rather stick to the semantics we have and the asm volatile barriers.
> 
> And barrier() isn't going to go away any time soon, either.  And
> ACCESS_ONCE() needs to keep volatile semantics until there is some
> memory_order_whatever that prevents loads and stores from being coalesced.

I'd be happy to discuss something like this in ISO C++ SG1 (or has this
been discussed in the past already?).  But it needs to have a paper I
suppose.

Will you be in Issaquah for the C++ meeting next week?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/