[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170608182134.0042716a@roar.ozlabs.ibm.com>
Date: Thu, 8 Jun 2017 18:21:34 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: torvalds@...ux-foundation.org, will.deacon@....com,
oleg@...hat.com, paulmck@...ux.vnet.ibm.com,
benh@...nel.crashing.org, mpe@...erman.id.au,
linux-kernel@...r.kernel.org, mingo@...nel.org,
stern@...land.harvard.edu, linuxppc-dev <linuxppc-dev@...abs.org>
Subject: Re: [RFC][PATCH 5/5] powerpc: Remove SYNC from _switch
On Thu, 8 Jun 2017 09:57:20 +0200
Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Jun 08, 2017 at 05:29:38PM +1000, Nicholas Piggin wrote:
> > On Thu, 8 Jun 2017 08:54:00 +0200
> > Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > > On Thu, Jun 08, 2017 at 10:32:44AM +1000, Nicholas Piggin wrote:
> > > > On Wed, 07 Jun 2017 18:15:06 +0200
> > > > Peter Zijlstra <peterz@...radead.org> wrote:
> > > >
> > > > > Now that the scheduler's rq->lock is RCsc and thus provides full
> > > > > transitivity between scheduling actions. And since we cannot migrate
> > > > > current, a task needs a switch-out and a switch-in in order to
> > > > > migrate, in which case the RCsc provides all the ordering we need.
> > > >
> > > > Hi Peter,
> > > >
> > > > I'm actually just working on removing this right now too, so
> > > > good timing.
> > > >
> > > > I think we can't "just" remove it, because it is required to order
> > > > MMIO on powerpc as well.
> > >
> > > How is MMIO special? That is, there is only MMIO before we call into
> > > schedule() right? So the rq->lock should be sufficient to order that
> > > too.
> >
> > MMIO uses different barriers. spinlock and smp_ type barriers do
> > not order it.
>
> Right, but you only have SYNC, which is what makes it possible at all.
Yeah, but a future CPU in theory could implement some other barrier
which provides hwsync ordering for cacheable memory but not uncacheable.
smp_mb* barriers would be able to use that new type of barrier, except
here.
> Some of the other architectures are not so lucky and need a different
> barrier, ARM for instance needs DSB(ISH) vs the DMB(ISH) provided by
> smp_mb(). IA64, MIPS and a few others are in the same boat as ARM.
>
> > > > But what I have done is to comment that some other primitives are
> > > > already providing the hwsync for other, so we don't have to add
> > > > another one in _switch.
> > >
> > > Right, so this patch relies on the smp_mb__before_spinlock ->
> > > smp_mb__after_spinlock conversion that makes the rq->lock RCsc and
> > > should thus provide the required SYNC for migrations.
> >
> > AFAIKS either one will do, so long as there is a hwsync there. The
> > point is just that I have added some commentary in the generic and
> > powerpc parts to make it clear we're relying on that behavior of
> > the primitive. smp_mb* is not guaranteed to order MMIO, it's just
> > that it does on powerpc.
>
> I'm not particularly happy with the generic comment; I don't feel we
> should care that PPC is special here.
I think we do though, because its smp_mb happens to also order mmio.
Your patch I think failed to capture that unless I miss something. It's
not that the rq lock is RCsc that we can remove the hwsync, it's that
the smp_mb__before/after_spinlock has a hwsync in it.
As a counter-example: I think you can implement RCsc spinlocks in
powerpc using ll/sc+isync for the acquire, but that would be insufficient
because no hwsync for MMIO.
> > > That said, I think you can already use the smp_mb__before_spinlock() as
> > > that is done with IRQs disabled, but its a more difficult argument. The
> > > rq->lock RCsc property should be more obvious.
> >
> > This is what I got.
> >
> > https://patchwork.ozlabs.org/patch/770154/
>
> Your comment isn't fully correct, smp_cond_load_acquire() isn't
> necessarily done by CPUy. It might be easiest to simply refer to the
> "Notes on Program-Order guarantees on SMP systems." comment.
True, thanks.
> > But I'm not sure if I followed I'm not sure why it's a more
> > difficult argument: any time a process moves it must first execute
> > a hwsync on the current CPU after it has performed all its access
> > there, and then it must execute hwsync on the new CPU before it
> > performs any new access.
>
> Yeah, its not a terribly difficult argument either way, but I feel the
> RSsc rq->lock on is slightly easier.
It is neater to have the barrier inside the lock, I think.
Powered by blists - more mailing lists