[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131104110553.GA8595@mudshark.cambridge.arm.com>
Date: Mon, 4 Nov 2013 11:05:53 +0000
From: Will Deacon <will.deacon@....com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Victor Kaplansky <VICTORK@...ibm.com>,
Oleg Nesterov <oleg@...hat.com>,
Anton Blanchard <anton@...ba.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Frederic Weisbecker <fweisbec@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Linux PPC dev <linuxppc-dev@...abs.org>,
Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
Michael Ellerman <michael@...erman.id.au>,
Michael Neuling <mikey@...ling.org>
Subject: Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
On Sun, Nov 03, 2013 at 11:34:00PM +0000, Linus Torvalds wrote:
> So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the
> problem is that a "smp_rmb()" doesn't really "attach" to the preceding
> write.
Agreed.
> This is analogous to a "acquire" operation: you cannot make an
> "acquire" barrier, because it's not a barrier *between* two ops, it's
> associated with one particular op.
>
> So what I *think* you actually really really want is a "store with
> release consistency, followed by a write barrier".
How does that order reads against reads? (Paul mentioned this as a
requirement). I not clear about the use case for this, so perhaps there is a
dependency that I'm not aware of.
> In TSO, afaik all stores have release consistency, and all writes are
> ordered, which is why this is a no-op in TSO. And x86 also has that
> "all stores have release consistency, and all writes are ordered"
> model, even if TSO doesn't really describe the x86 model.
>
> But on ARM64, for example, I think you'd really want the store itself
> to be done with "stlr" (store with release), and then follow up with a
> "dsb st" after that.
So a dsb is pretty heavyweight here (it prevents execution of *any* further
instructions until all preceeding stores have completed, as well as
ensuring completion of any ongoing cache flushes). In conjunction with the
store-release, that's going to hold everything up until the store-release
(and therefore any preceeding memory accesses) have completed. Granted, I
think that gives Paul his read/read ordering, but it's a lot heavier than
what's required.
> And notice how that requires you to mark the store itself. There is no
> actual barrier *after* the store that does the optimized model.
>
> Of course, it's entirely possible that it's not worth worrying about
> this on ARM64, and that just doing it as a "normal store followed by a
> full memory barrier" is good enough. But at least in *theory* a
> microarchitecture might make it much cheaper to do a "store with
> release consistency" followed by "write barrier".
I agree with the sentiment but, given that this stuff is so heavily
microarchitecture-dependent (and not simple to probe), a simple dmb ish
might be the best option after all. That's especially true if the
microarchitecture decided to ignore the barrier options and treat everything
as `all accesses, full system' in order to keep the hardware design simple.
Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists