linux-kernel - Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160803073134-mutt-send-email-mst@kernel.org>
Date:	Wed, 3 Aug 2016 07:36:34 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Dexuan Cui <decui@...rosoft.com>,
	"linux-x86_64@...r.kernel.org" <linux-x86_64@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	David Howells <dhowells@...hat.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?

On Thu, Mar 03, 2016 at 11:05:43AM -0800, H. Peter Anvin wrote:
> On March 3, 2016 10:35:50 AM PST, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> >On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote:
> >> On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote:
> >> > 
> >> > * Dexuan Cui <decui@...rosoft.com> wrote:
> >> > 
> >> > > Hi,
> >> > > My understanding about arch/x86/include/asm/barrier.h is:
> >obviously Linux
> >> > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32
> >platforms that
> >> > > don't support XMM2.
> >> > > 
> >> > > However, it looks people say Locked Add is much faster than the
> >FENCE
> >> > > instructions, even on modern Intel CPUs like Haswell, e.g.,
> >please see
> >> > > the three sources:
> >> > > 
> >> > > " 11.5.1 Locked Instructions as Memory Barriers
> >> > > Optimization
> >> > > Use locked instructions to implement Store/Store and Store/Load
> >barriers.
> >> > > "
> >> > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
> >> > > 
> >> > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier
> >":
> >> > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
> >> > > 
> >> > > "...locked instruction are more efficient barriers...":
> >> > >
> >http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
> >> > > 
> >> > > I also found that FreeBSD prefers Locked Add.
> >> > > 
> >> > > So, I'm curious why Linux prefers MFENCE.
> >> > > I guess I may be missing something.
> >> > > 
> >> > > I tried to google the question, but didn't find an answer.
> >> > 
> >> > It's being worked on, see this thread on lkml from a few weeks ago:
> >> > 
> >> >    C Jan 13 Michael S. Tsir    | [PATCH v3 0/4] x86: faster
> >mb()+documentation tweaks
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 1/4] x86: add cc
> >clobber for addl
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 2/4] x86: drop a
> >comment left over from X86_OOSTORE
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 3/4] x86: tweak the
> >comment about use of wmb for IO
> >> >    C Jan 13 Michael S. Tsir    | ├─>[PATCH v3 4/4] x86: drop mfence
> >in favor of lock+addl
> >> > 
> >> > The 4th patch changes MFENCE to a LOCK ADDL locked instruction.
> >> 
> >> Lots of additional chatter here:
> >> 
> >>   lkml.kernel.org/r/20160112150032-mutt-send-email-mst@...hat.com
> >> 
> >> And some useful bits here:
> >> 
> >>   lkml.kernel.org/r/56957D54.5000602@...or.com
> >> 
> >> latest version here:
> >> 
> >>   lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@...hat.com
> >
> >It's ready as far as I am concerned.
> >Basically we are just waiting for ack from hpa.
> 
> And I'm still discussing this with the hardware people.  It seems we
> can do this for *most* things, but not all; the question is where
> exactly we need to do something different.

I'm guessing there's still no update?

There's a decent chance that without documentation a bunch of current
uses are actually broken. See for example
http://marc.info/?l=linux-kernel&m=145400059304553&w=2
which going by the manual is fixing smp_mb misuse for clflush - or maybe not?

> -- 
> Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.