[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2a26fbf8-ee00-4672-8f19-fa2b28410c7b@redhat.com>
Date: Fri, 17 Feb 2017 14:01:31 -0500
From: Waiman Long <longman@...hat.com>
To: panxinhui <xinhui@...ux.vnet.ibm.com>,
Xinhui Pan <mnipxh@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve
performance on some archs
On 02/12/2017 09:24 PM, panxinhui wrote:
>
> 在 2017/2/10 上午4:53, Waiman Long 写道:
>> On 02/07/2017 10:39 PM, Xinhui Pan wrote:
>>>
>>> 2016-12-26 4:26 GMT+08:00 Waiman Long <longman@...hat.com <mailto:longman@...hat.com>>:
>>>
>>> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
>>> relaxed versions to improve performance on architectures that use LL/SC.
>>>
>>> All the locking related cmpxchg's are replaced with the _acquire
>>> variants:
>>> - pv_queued_spin_steal_lock()
>>> - trylock_clear_pending()
>>>
>>> The cmpxchg's related to hashing are replaced by either by the _release
>>> or the _relaxed variants. See the inline comment for details.
>>>
>>> Signed-off-by: Waiman Long <longman@...hat.com <mailto:longman@...hat.com>>
>>>
>>> v1->v2:
>>> - Add comments in changelog and code for the rationale of the change.
>>>
>>> ---
>>> kernel/locking/qspinlock_paravirt.h | 50 ++++++++++++++++++++++++-------------
>>> 1 file changed, 33 insertions(+), 17 deletions(-)
>>>
>>>
>>> @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
>>> * If pv_kick_node() changed us to vcpu_hashed, retain that
>>> * value so that pv_wait_head_or_lock() knows to not also try
>>> * to hash this lock.
>>> + *
>>> + * The smp_store_mb() and control dependency above will ensure
>>> + * that state change won't happen before that. Synchronizing
>>> + * with pv_kick_node() wrt hashing by this waiter or by the
>>> + * lock holder is done solely by the state variable. There is
>>> + * no other ordering requirement.
>>> */
>>> - cmpxchg(&pn->state, vcpu_halted, vcpu_running);
>>> + cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running);
>>>
>>> /*
>>> * If the locked flag is still not set after wakeup, it is a
>>> @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
>>> * pv_wait_node(). If OTOH this fails, the vCPU was running and will
>>> * observe its next->locked value and advance itself.
>>> *
>>> - * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
>>> + * Matches with smp_store_mb() and cmpxchg_relaxed() in pv_wait_node().
>>> + * A release barrier is used here to ensure that node->locked is
>>> + * always set before changing the state. See comment in pv_wait_node().
>>> */
>>> - if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted)
>>> + if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
>>> + != vcpu_halted)
>>> return;
>>>
>>> hi, Waiman
>>> We can't use _release here, a full barrier is needed.
>>>
>>> There is pv_kick_node vs pv_wait_head_or_lock
>>>
>>> [w] l->locked = _Q_SLOW_VAL //reordered here
>>> if (READ_ONCE(pn->state) == vcpu_hashed) //False.
>>> lp = (struct qspinlock **)1;
>>>
>>> [STORE] pn->state = vcpu_hashed lp = pv_hash(lock, pn);
>>> pv_hash() if (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed.
>>>
>>> Then the same lock has hashed twice but only unhashed once. So at last as the hash table grows big, we hit RCU stall.
>>>
>>> I hit RCU stall when I run netperf benchmark
>>>
>>> thanks
>>> xinhui
>>>
>>>
>>> --
>>> 1.8.3.1
>>>
>>>
>> Yes, I know I am being too aggressive in this patch. I am going to tone it down a bit. I just don't have time to run a performance test on PPC system to verify the gain yet. I am planning to send an updated patch soon.
>>
> hi, All
>
> I guess I have found the scenario that causes the RCU stall.
>
> pv_wait_node
> [L] pn->state // this load is reordered from cmxchg_release.
> smp_store_mb(pn->state, vcpu_halted);
> if (!READ_ONCE(node->locked))
> arch_mcs_spin_unlock_contended(&next->locked);
> pv_kick_node
> [-L]cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
> //cmpxchg_release fails, so pn->state keep as it is.
> pv_wait(&pn->state, vcpu_halted);
> //on PPC, It will not return until pn->state != vcpu_halted.
>
> And when rcu stall hit, I fire an BUG(), and enter debug mode, it seems most cpus are in pv_wait...
>
> So the soltuion to solve this problems is simple, keep the cmpxchg as it is in pv_kick_node, cmpxchg on ppc provides full barriers.
>
> thanks
> xinhui
>
Sorry for the late reply and thank for the explanation. I am not aware
that the load of cmpxchg_release can be moved up like that. I will need
to be more careful of using relaxed form of cmpxchg next time.
Cheers,
Longman
Powered by blists - more mailing lists