[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150715141814.GZ3717@linux.vnet.ibm.com>
Date: Wed, 15 Jul 2015 07:18:14 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Michael Ellerman <mpe@...erman.id.au>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Will Deacon <will.deacon@....com>, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Michael Ellerman <michaele@....ibm.com>
Subject: Re: [RFC PATCH v2] memory-barriers: remove
smp_mb__after_unlock_lock()
On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote:
> On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > > into a full memory barrier.
> > >
> > > However:
> > >
> > > - This ordering guarantee is already provided without the barrier on
> > > all architectures apart from PowerPC
> > >
> > > - The barrier only applies to UNLOCK + LOCK, not general
> > > RELEASE + ACQUIRE operations
> > >
> > > - Locks are generally assumed to offer SC ordering semantics, so
> > > having this additional barrier is error-prone and complicates the
> > > callers of LOCK/UNLOCK primitives
> > >
> > > - The barrier is not well used outside of RCU and, because it was
> > > retrofitted into the kernel, it's not clear whether other areas of
> > > the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> > > barrier
> > >
> > > This patch removes the barrier and instead requires architectures to
> > > provide full barrier semantics for an UNLOCK + LOCK sequence.
> > >
> > > Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>
> > > Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>
> > > Cc: Peter Zijlstra <peterz@...radead.org>
> > > Signed-off-by: Will Deacon <will.deacon@....com>
> > > ---
> > >
> > > This didn't go anywhere last time I posted it, but here it is again.
> > > I'd really appreciate some feedback from the PowerPC guys, especially as
> > > to whether this change requires them to add an additional barrier in
> > > arch_spin_unlock and what the cost of that would be.
> >
> > We'd have to turn the lwsync in unlock or the isync in lock into a full
> > barrier. As it is, we *almost* have a full barrier semantic, but not
> > quite, as in things can get mixed up inside spin_lock between the LL and
> > the SC (things leaking in past LL and things leaking "out" up before SC
> > and then getting mixed up in there).
> >
> > Michael, at some point you were experimenting a bit with that and tried
> > to get some perf numbers of the impact that would have, did that
> > solidify ? Otherwise, I'll have a look when I'm back next week.
>
> I was mainly experimenting with replacing the lwsync in lock with an isync.
>
> But I think you're talking about making it a full sync in lock.
>
> That was about +7% on p8, +25% on p7 and +88% on p6.
Just for completeness, what were you running as benchmark? ;-)
Thanx, Paul
> We got stuck deciding whether isync was safe to use as a memory barrier,
> because the wording in the arch is a bit vague.
>
> But if we're talking about a full sync then I think there is no question that's
> OK and we should just do it.
>
> cheers
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists