lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 13 Jan 2020 16:32:14 +0800 From: yezengruan <yezengruan@...wei.com> To: Waiman Long <longman@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>, "Catalin Marinas" <catalin.marinas@....com> CC: <linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org> Subject: Re: [PATCH v2] locking/osq: Use optimized spinning loop for arm64 Hi Waiman, On 2020/1/13 7:58, Waiman Long wrote: > Arm64 has a more optimized spinning loop (atomic_cond_read_acquire) > for spinlock that can boost performance of sibling threads by putting > the current cpu to a shallow sleep state that is woken up only when > the monitored variable changes or an external event happens. > > OSQ has a more complicated spinning loop. Besides the lock value, it > also checks for need_resched() and vcpu_is_preempted(). The check for > need_resched() is not a problem as it is only set by the tick interrupt > handler. That will be detected by the spinning cpu right after iret. > > The vcpu_is_preempted() check, however, is a problem as changes to the > preempt state of of previous node will not affect the sleep state. For > ARM64, vcpu_is_preempted is not defined and so is a no-op. To guard > against future addition of vcpu_is_preempted() to arm64, code is added > to cause build error when vcpu_is_preempted becomes defined in arm64 > without the corresponding changes in the OSQ spinning code. Recently, I am supporting vcpu_is_preempted() for arm64. There is a patch set which do this[1]. [1] https://lore.kernel.org/linux-arm-kernel/20191226135833.1052-1-yezengruan@huawei.com/ > > On a 2-socket 56-core 224-thread ARM64 system, a kernel mutex locking > microbenchmark was run for 10s with and without the patch. The > performance numbers before patch were: > > Running locktest with mutex [runtime = 10s, load = 1] > Threads = 224, Min/Mean/Max = 316/123,143/2,121,269 > Threads = 224, Total Rate = 2,757 kop/s; Percpu Rate = 12 kop/s > > After patch, the numbers were: > > Running locktest with mutex [runtime = 10s, load = 1] > Threads = 224, Min/Mean/Max = 334/147,836/1,304,787 > Threads = 224, Total Rate = 3,311 kop/s; Percpu Rate = 15 kop/s > > So there was about 20% performance improvement. > > Signed-off-by: Waiman Long <longman@...hat.com> > --- > arch/arm64/include/asm/barrier.h | 10 ++++++++++ > kernel/locking/osq_lock.c | 25 ++++++++++++------------- > 2 files changed, 22 insertions(+), 13 deletions(-) > > diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h > index 7d9cc5ec4971..8eb5f1239885 100644 > --- a/arch/arm64/include/asm/barrier.h > +++ b/arch/arm64/include/asm/barrier.h > @@ -152,6 +152,16 @@ do { \ > VAL; \ > }) > > +/* > + * In osq_lock(), smp_cond_load_relaxed() is called with a condition > + * that includes vcpu_is_preempted(). For arm64, vcpu_is_preempted is not > + * currently defined. So it is a no-op. If vcpu_is_preempted is defined in > + * the future, smp_cond_load_relaxed() will not response to changes in the > + * preempt state in a timely manner. So code changes will have to be made > + * to address this deficiency. > + */ > +#define vcpu_is_preempted_not_used > + > #define smp_cond_load_acquire(ptr, cond_expr) \ > ({ \ > typeof(ptr) __PTR = (ptr); \ > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > index 6ef600aa0f47..69ec5161c3cc 100644 > --- a/kernel/locking/osq_lock.c > +++ b/kernel/locking/osq_lock.c > @@ -13,6 +13,14 @@ > */ > static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node); > > +/* > + * The optimized smp_cond_load_relaxed() spin loop should not be used with > + * vcpu_is_preempted defined. > + */ > +#if defined(vcpu_is_preempted) && defined(vcpu_is_preempted_not_used) > +#error "vcpu_is_preempted() inside smp_cond_load_relaxed() may not work!" > +#endif > + > /* > * We use the value 0 to represent "no CPU", thus the encoded value > * will be the CPU number incremented by 1. > @@ -134,20 +142,11 @@ bool osq_lock(struct optimistic_spin_queue *lock) > * cmpxchg in an attempt to undo our queueing. > */ > > - while (!READ_ONCE(node->locked)) { > - /* > - * If we need to reschedule bail... so we can block. > - * Use vcpu_is_preempted() to avoid waiting for a preempted > - * lock holder: > - */ > - if (need_resched() || vcpu_is_preempted(node_cpu(node->prev))) > - goto unqueue; > - > - cpu_relax(); > - } > - return true; > + if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() || > + vcpu_is_preempted(node_cpu(node->prev)))) > + return true; > > -unqueue: > + /* unqueue */ > /* > * Step - A -- stabilize @prev > * > Thanks, Zengruan
Powered by blists - more mailing lists