lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	25 Feb 2014 22:06:53 -0500
From:	"George Spelvin" <linux@...izon.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	akpm@...ux-foundation.org, dhowells@...hat.com, gcc@....gnu.org,
	linux@...izon.com, linux-arch@...r.kernel.org,
	linux-kernel@...r.kernel.org, mingo@...nel.org,
	peterz@...radead.org, Ramana.Radhakrishnan@....com,
	torvalds@...ux-foundation.org, triegel@...hat.com,
	will.deacon@....com
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework

<paulmck@...ux.vnet.ibm.com> wrote:
> <torvalds@...ux-foundation.org> wrote:
>> I have for the last several years been 100% convinced that the Intel
>> memory ordering is the right thing, and that people who like weak
>> memory ordering are wrong and should try to avoid reproducing if at
>> all possible.
>
> Are ARM and Power really the bad boys here?  Or are they instead playing
> the role of the canary in the coal mine?

To paraphrase some older threads, I think Linus's argument is that
weak memory ordering is like branch delay slots: a way to make a simple
implementation simpler, but ends up being no help to a more aggressive
implementation.

Branch delay slots give a one-cycle bonus to in-order cores, but
once you go superscalar and add branch prediction, they stop helping,
and once you go full out of order, they're just an annoyance.

Likewise, I can see the point that weak ordering can help make a simple
cache interface simpler, but once you start doing speculative loads,
you've already bought and paid for all the hardware you need to do
stronger coherency.

Another thing that requires all the strong-coherency machinery is
a high-performance implementation of the various memory barrier and
synchronization operations.  Yes, a low-performance (drain the pipeline)
implementation is tolerable if the instructions aren't used frequently,
but once you're really trying, it doesn't save complexity.

Once you're there, strong coherency always doesn't actually cost you any
time outside of critical synchronization code, and it both simplifies
and speeds up the tricky synchronization software.


So PPC and ARM's weak ordering are not the direction the future is going.
Rather, weak ordering is something that's only useful in a limited
technology window, which is rapidly passing.

If you can find someone in IBM who's worked on the Z series cache
coherency (extremely strong ordering), they probably have some useful
insights.  The big question is if strong ordering, once you've accepted
the implementation complexity and area, actually costs anything in
execution time.  If there's an unavoidable cost which weak ordering saves,
that's significant.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ