[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170221130400.GG300@arm.com>
Date: Tue, 21 Feb 2017 13:04:00 +0000
From: Will Deacon <will.deacon@....com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Andrea Parri <parri.andrea@...il.com>,
Waiman Long <longman@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Pan Xinhui <xinhui@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] locking/pvqspinlock: Relax cmpxchg's to improve
performance on some archs
On Mon, Feb 20, 2017 at 12:58:39PM +0800, Boqun Feng wrote:
> > So Waiman, the fact is that in this case, we want the following code
> > sequence:
> >
> > CPU 0 CPU 1
> > ================= ====================
> > {pn->state = vcpu_running, node->locked = 0}
> >
> > smp_store_smb(&pn->state, vcpu_halted):
> > WRITE_ONCE(pn->state, vcpu_halted);
> > smp_mb();
> > r1 = READ_ONCE(node->locked);
> > arch_mcs_spin_unlock_contented();
> > WRITE_ONCE(node->locked, 1)
> >
> > cmpxchg(&pn->state, vcpu_halted, vcpu_hashed);
> >
> > never ends up in:
> >
> > r1 == 0 && cmpxchg fail(i.e. the read part of cmpxchg reads the
> > value vcpu_running).
> >
> > We can have such a guarantee if cmpxchg has a smp_mb() before its load
> > part, which is true for PPC. But semantically, cmpxchg() doesn't provide
> > any order guarantee if it fails, which is true on ARM64, IIUC. (Add Will
> > in Cc for his insight ;-)).
I think you're right. The write to node->locked on CPU1 is not required
to be ordered before the load part of the failing cmpxchg.
> > So a possible "fix"(in case ARM64 will use qspinlock some day), would be
> > replace cmpxchg() with smp_mb() + cmpxchg_relaxed().
Peversely, we could actually get away with cmpxchg_acquire on arm64 because
arch_mcs_spin_unlock_contended is smp_store_release and we order release ->
acquire in the architecture. But that just brings up the age old unlock/lock
discussion again...
Will
Powered by blists - more mailing lists