linux-kernel - Re: [RFC] arch: Introduce new TSO memory barrier smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131104110553.GA8595@mudshark.cambridge.arm.com>
Date:	Mon, 4 Nov 2013 11:05:53 +0000
From:	Will Deacon <will.deacon@....com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Victor Kaplansky <VICTORK@...ibm.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Anton Blanchard <anton@...ba.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux PPC dev <linuxppc-dev@...abs.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	Michael Ellerman <michael@...erman.id.au>,
	Michael Neuling <mikey@...ling.org>
Subject: Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()

On Sun, Nov 03, 2013 at 11:34:00PM +0000, Linus Torvalds wrote:
> So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the
> problem is that a "smp_rmb()" doesn't really "attach" to the preceding
> write.

Agreed.

> This is analogous to a "acquire" operation: you cannot make an
> "acquire" barrier, because it's not a barrier *between* two ops, it's
> associated with one particular op.
> 
> So what I *think* you actually really really want is a "store with
> release consistency, followed by a write barrier".

How does that order reads against reads? (Paul mentioned this as a
requirement). I not clear about the use case for this, so perhaps there is a
dependency that I'm not aware of.

> In TSO, afaik all stores have release consistency, and all writes are
> ordered, which is why this is a no-op in TSO. And x86 also has that
> "all stores have release consistency, and all writes are ordered"
> model, even if TSO doesn't really describe the x86 model.
> 
> But on ARM64, for example, I think you'd really want the store itself
> to be done with "stlr" (store with release), and then follow up with a
> "dsb st" after that.

So a dsb is pretty heavyweight here (it prevents execution of *any* further
instructions until all preceeding stores have completed, as well as
ensuring completion of any ongoing cache flushes). In conjunction with the
store-release, that's going to hold everything up until the store-release
(and therefore any preceeding memory accesses) have completed. Granted, I
think that gives Paul his read/read ordering, but it's a lot heavier than
what's required.

> And notice how that requires you to mark the store itself. There is no
> actual barrier *after* the store that does the optimized model.
> 
> Of course, it's entirely possible that it's not worth worrying about
> this on ARM64, and that just doing it as a "normal store followed by a
> full memory barrier" is good enough. But at least in *theory* a
> microarchitecture might make it much cheaper to do a "store with
> release consistency" followed by "write barrier".

I agree with the sentiment but, given that this stuff is so heavily
microarchitecture-dependent (and not simple to probe), a simple dmb ish
might be the best option after all. That's especially true if the
microarchitecture decided to ignore the barrier options and treat everything
as `all accesses, full system' in order to keep the hardware design simple.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/