lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 17 Feb 2017 14:01:31 -0500 From: Waiman Long <longman@...hat.com> To: panxinhui <xinhui@...ux.vnet.ibm.com>, Xinhui Pan <mnipxh@...il.com> Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org, Boqun Feng <boqun.feng@...il.com> Subject: Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs On 02/12/2017 09:24 PM, panxinhui wrote: > > 在 2017/2/10 上午4:53, Waiman Long 写道: >> On 02/07/2017 10:39 PM, Xinhui Pan wrote: >>> >>> 2016-12-26 4:26 GMT+08:00 Waiman Long <longman@...hat.com <mailto:longman@...hat.com>>: >>> >>> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more >>> relaxed versions to improve performance on architectures that use LL/SC. >>> >>> All the locking related cmpxchg's are replaced with the _acquire >>> variants: >>> - pv_queued_spin_steal_lock() >>> - trylock_clear_pending() >>> >>> The cmpxchg's related to hashing are replaced by either by the _release >>> or the _relaxed variants. See the inline comment for details. >>> >>> Signed-off-by: Waiman Long <longman@...hat.com <mailto:longman@...hat.com>> >>> >>> v1->v2: >>> - Add comments in changelog and code for the rationale of the change. >>> >>> --- >>> kernel/locking/qspinlock_paravirt.h | 50 ++++++++++++++++++++++++------------- >>> 1 file changed, 33 insertions(+), 17 deletions(-) >>> >>> >>> @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev) >>> * If pv_kick_node() changed us to vcpu_hashed, retain that >>> * value so that pv_wait_head_or_lock() knows to not also try >>> * to hash this lock. >>> + * >>> + * The smp_store_mb() and control dependency above will ensure >>> + * that state change won't happen before that. Synchronizing >>> + * with pv_kick_node() wrt hashing by this waiter or by the >>> + * lock holder is done solely by the state variable. There is >>> + * no other ordering requirement. >>> */ >>> - cmpxchg(&pn->state, vcpu_halted, vcpu_running); >>> + cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running); >>> >>> /* >>> * If the locked flag is still not set after wakeup, it is a >>> @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) >>> * pv_wait_node(). If OTOH this fails, the vCPU was running and will >>> * observe its next->locked value and advance itself. >>> * >>> - * Matches with smp_store_mb() and cmpxchg() in pv_wait_node() >>> + * Matches with smp_store_mb() and cmpxchg_relaxed() in pv_wait_node(). >>> + * A release barrier is used here to ensure that node->locked is >>> + * always set before changing the state. See comment in pv_wait_node(). >>> */ >>> - if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted) >>> + if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed) >>> + != vcpu_halted) >>> return; >>> >>> hi, Waiman >>> We can't use _release here, a full barrier is needed. >>> >>> There is pv_kick_node vs pv_wait_head_or_lock >>> >>> [w] l->locked = _Q_SLOW_VAL //reordered here >>> if (READ_ONCE(pn->state) == vcpu_hashed) //False. >>> lp = (struct qspinlock **)1; >>> >>> [STORE] pn->state = vcpu_hashed lp = pv_hash(lock, pn); >>> pv_hash() if (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed. >>> >>> Then the same lock has hashed twice but only unhashed once. So at last as the hash table grows big, we hit RCU stall. >>> >>> I hit RCU stall when I run netperf benchmark >>> >>> thanks >>> xinhui >>> >>> >>> -- >>> 1.8.3.1 >>> >>> >> Yes, I know I am being too aggressive in this patch. I am going to tone it down a bit. I just don't have time to run a performance test on PPC system to verify the gain yet. I am planning to send an updated patch soon. >> > hi, All > > I guess I have found the scenario that causes the RCU stall. > > pv_wait_node > [L] pn->state // this load is reordered from cmxchg_release. > smp_store_mb(pn->state, vcpu_halted); > if (!READ_ONCE(node->locked)) > arch_mcs_spin_unlock_contended(&next->locked); > pv_kick_node > [-L]cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed) > //cmpxchg_release fails, so pn->state keep as it is. > pv_wait(&pn->state, vcpu_halted); > //on PPC, It will not return until pn->state != vcpu_halted. > > And when rcu stall hit, I fire an BUG(), and enter debug mode, it seems most cpus are in pv_wait... > > So the soltuion to solve this problems is simple, keep the cmpxchg as it is in pv_kick_node, cmpxchg on ppc provides full barriers. > > thanks > xinhui > Sorry for the late reply and thank for the explanation. I am not aware that the load of cmpxchg_release can be moved up like that. I will need to be more careful of using relaxed form of cmpxchg next time. Cheers, Longman
Powered by blists - more mailing lists