lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 26 Jun 2016 15:08:20 +0800 From: panxinhui <xinhui@...ux.vnet.ibm.com> To: Boqun Feng <boqun.feng@...il.com> Cc: panxinhui <xinhui@...ux.vnet.ibm.com>, Peter Zijlstra <peterz@...radead.org>, Pan Xinhui <xinhui.pan@...ux.vnet.ibm.com>, linux-kernel@...r.kernel.org, mingo@...hat.com, dave@...olabs.net, will.deacon@....com, Waiman.Long@....com, benh@...nel.crashing.org Subject: Re: [PATCH] locking/osq: Drop the overload of osq lock > 在 2016年6月26日,14:59,Boqun Feng <boqun.feng@...il.com> 写道: > > On Sun, Jun 26, 2016 at 02:10:57PM +0800, Boqun Feng wrote: >> On Sun, Jun 26, 2016 at 01:21:04PM +0800, panxinhui wrote: >>> >>>> 在 2016年6月26日,03:20,Peter Zijlstra <peterz@...radead.org> 写道: >>>> >>>> On Sun, Jun 26, 2016 at 01:27:56AM +0800, panxinhui wrote: >>>>>>> Would that not have issues where the owner cpu is kept running but the >>>>>>> spinner (ie. _this_ vcpu) gets preempted? I would think that in that >>>>>>> case we too want to stop spinning. >>>>>>> >>>>>> >>>>> do you mean that the spinner detect itself had yield out during the >>>>> big spin loop? >>>>> >>>>> It is very possible to happen. BUT if spinner(on this vcpu) yield >>>>> out, the next spinner would break the spin loop. AND if spinner >>>>> detect itself yield out once, it’s very possible to get the osq lock >>>>> soon as long as the ower vcpu is running. >>>>> >>>>> SO I think we need just check the owner vcpu’s yield_count. >>>> >>>> I had a quick look at KVM and it looks like it only has >>>> kvm_cpu::preempted, which would suggest the interface boqun proposed. >>>> >>>> We'll have to look at many of the other virt platforms as well to see >>>> what they can do. >>>> >>>> We could also provide _both_ interfaces and a platform can implement >>>> whichever variant (or both) it can. >>>> >>> the kvm code on ppc has implemented yield_count. >>> It let me feel a little relaxed. :) >>> >>> looks like we could introduce the interface like below. >>> >>> bool vcpu_is_preempted(int cpu) >>> { >>> return arch_vcpu_is_preempted(cpu); >>> } >>> >>> #ifdef arch_vcpu_has_yield_count >>> bool vcpu_has_preemtped_once(int cpu, unsigned int yield_count) >>> { >>> return arch_get_vcpu_yield_count() != yield_count; >>> } >>> >>> #else >>> bool vcpu_has_preemtped_once(int cpu, unsigned int yield_count) >>> { >>> /*just let called know it is preepmpted*/ >>> return vcpu_is_preempted(cpu); >>> } >>> #endif >>> >> >> Unfortunately, on PPC, we compile pseries code along with powernv code >> in a kernel binary, therefore we need to wire the proper primitives at >> runtime. >> >> I've cooked the following patch, please note I haven't test this, this >> is just a POC. >> >> Regards, >> Boqun >> >> ---------------->8 >> virt: Introduce vcpu preemption detection functions >> >> Lock holder preemption is an serious problem in virtualized environment >> for locking. Different archs and hypervisors employ different ways to >> try to solve the problem in different locking mechanisms. And recently >> Pan Xinhui found a significant performance drop in osq_lock(), which >> could be solved by providing a mechanism to detect the preemption of >> vcpu. >> >> Therefore this patch introduces several vcpu preemption detection >> primitives, which allows locking primtives to save the time of >> unnecesarry spinning. These primitives act as an abstract layer between >> locking functions and arch or hypervisor related code. >> >> There are two sets of primitives: >> >> 1. vcpu_preempt_count() and vcpu_has_preempted(), they must be used >> pairwisely in a same preempt disable critical section. And they >> could detect whether a vcpu preemption happens between them. >> >> 2. vcpu_is_preempted(), used to check whether other cpu's vcpu is >> preempted. >> >> This patch also implements those primitives on pseries and wire them up. >> >> Signed-off-by: Boqun Feng <boqun.feng@...il.com> >> --- >> arch/powerpc/platforms/pseries/Kconfig | 1 + >> arch/powerpc/platforms/pseries/setup.c | 28 +++++++++++ >> include/linux/vcpu_preempt.h | 90 ++++++++++++++++++++++++++++++++++ >> 3 files changed, 119 insertions(+) >> create mode 100644 include/linux/vcpu_preempt.h >> >> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig >> index bec90fb30425..7d24c3e48878 100644 >> --- a/arch/powerpc/platforms/pseries/Kconfig >> +++ b/arch/powerpc/platforms/pseries/Kconfig >> @@ -21,6 +21,7 @@ config PPC_PSERIES >> select HOTPLUG_CPU if SMP >> select ARCH_RANDOM >> select PPC_DOORBELL >> + select HAS_VCPU_PREEMPTION_DETECTION >> default y >> >> config PPC_SPLPAR >> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c >> index 9883bc7ea007..5d4aed54e039 100644 >> --- a/arch/powerpc/platforms/pseries/setup.c >> +++ b/arch/powerpc/platforms/pseries/setup.c >> @@ -42,6 +42,7 @@ >> #include <linux/of.h> >> #include <linux/of_pci.h> >> #include <linux/kexec.h> >> +#include <linux/vcpu_preempt.h> >> >> #include <asm/mmu.h> >> #include <asm/processor.h> >> @@ -501,6 +502,31 @@ static void __init find_and_init_phbs(void) >> of_pci_check_probe_only(); >> } >> >> +struct vcpu_preempt_ops vcpu_preempt_ops = DEFAULT_VCPU_PREEMPT_OPS; >> +EXPORT_SYMBOL(vcpu_preempt_ops); >> + >> +static long pseries_vcpu_preempt_count(void) >> +{ >> + return be32_to_cpu(get_lppaca()->yield_count); >> +} >> + >> +static bool pseries_vcpu_is_preempted(int cpu) >> +{ >> + return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1); >> +} >> + >> +static bool pseries_vcpu_has_preempted(long vpc) >> +{ >> + return pseries_vcpu_preempt_count() != vpc; >> +} >> + >> +static void __init pseries_setup_vcpu_preempt_ops(void) >> +{ >> + vcpu_preempt_ops.preempt_count = pseries_vcpu_preempt_count; >> + vcpu_preempt_ops.is_preempted = pseries_vcpu_is_preempted; >> + vcpu_preempt_ops.has_preempted = pseries_vcpu_has_preempted; >> +} >> + >> static void __init pSeries_setup_arch(void) >> { >> set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); >> @@ -549,6 +575,8 @@ static void __init pSeries_setup_arch(void) >> "%ld\n", rc); >> } >> } >> + >> + pseries_setup_vcpu_preempt_ops(); >> } >> >> static int __init pSeries_init_panel(void) >> diff --git a/include/linux/vcpu_preempt.h b/include/linux/vcpu_preempt.h >> new file mode 100644 >> index 000000000000..4a88414bb83f >> --- /dev/null >> +++ b/include/linux/vcpu_preempt.h >> @@ -0,0 +1,90 @@ >> +/* >> + * Primitives for checking the vcpu preemption from the guest. >> + */ >> + >> +static long __vcpu_preempt_count(void) >> +{ >> + return 0; >> +} >> + >> +static bool __vcpu_has_preempted(long vpc) >> +{ >> + return false; >> +} >> + >> +static bool __vcpu_is_preempted(int cpu) >> +{ >> + return false; >> +} >> + >> +struct vcpu_preempt_ops { >> + /* >> + * Get the current vcpu's "preempt count", which is going to use for >> + * checking whether the current vcpu has ever been preempted >> + */ >> + long (*preempt_count)(void); >> + >> + /* >> + * Return whether a vcpu is preempted >> + */ >> + bool (*is_preempted)(int cpu); >> + >> + /* >> + * Given a "vcpu preempt count", Return whether a vcpu preemption ever >> + * happened after the .preempt_count() was called. >> + */ >> + bool (*has_preempted)(long vpc); >> +}; >> + >> +struct vcpu_preempt_ops vcpu_preempt_ops; >> + > > This should be: > > extern struct vcpu_preempt_ops vcpu_preempt_ops; > > And I tested this one along with modified version of Xinhui's patch. > > The test showed that even in a not over-committed system, the overhead > of mutex_lock() could go down from 2.58%(w/o patches) to 0.87%(w/ > patches). > > Xinhui, I did also modifiy the title and commit log of your patch, feel > free to take anything into your patch. > thanks boqun, I will review your patch. > Regards, > Boqun > > ----------------->8 > locking/osq: Stop spinning in osq_lock if related vcpus get preempted > > In a virtualized environment, a cpu may hold the osq_lock while its vcpu > gets preempted, which makes the spinning of other cpus in the osq > pointless, because it may be quite a while for the holder to get back to > run and notify its successor to go. > > To fix this, we use the vcpu preemption detection mechanism to check > whether the holder is preempted out, if so we just stop spinning in the > osq_lock(). Morever, if the spinner has ever been preempted, it also > indicates that the spining in osq_lock() is pointless, so we detect this > and stop spinning too. > > Signed-off-by: Pan Xinhui <xinhui.pan@...ux.vnet.ibm.com> > --- > kernel/locking/osq_lock.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > index 05a37857ab55..bed5a56a6aed 100644 > --- a/kernel/locking/osq_lock.c > +++ b/kernel/locking/osq_lock.c > @@ -1,6 +1,7 @@ > #include <linux/percpu.h> > #include <linux/sched.h> > #include <linux/osq_lock.h> > +#include <linux/vcpu_preempt.h> > > /* > * An MCS like lock especially tailored for optimistic spinning for sleeping > @@ -87,6 +88,8 @@ bool osq_lock(struct optimistic_spin_queue *lock) > struct optimistic_spin_node *prev, *next; > int curr = encode_cpu(smp_processor_id()); > int old; > + int loops; > + long vpc; > > node->locked = 0; > node->next = NULL; > @@ -106,6 +109,9 @@ bool osq_lock(struct optimistic_spin_queue *lock) > node->prev = prev; > WRITE_ONCE(prev->next, node); > > + old = old - 1; > + vpc = vcpu_preempt_count(); > + > /* > * Normally @prev is untouchable after the above store; because at that > * moment unlock can proceed and wipe the node element from stack. > @@ -118,8 +124,14 @@ bool osq_lock(struct optimistic_spin_queue *lock) > while (!READ_ONCE(node->locked)) { > /* > * If we need to reschedule bail... so we can block. > + * An over-committed guest with more vCPUs than pCPUs > + * might fall in this loop and cause a huge overload. > + * This is because vCPU A(prev) hold the osq lock and yield out, > + * vCPU B(node) wait ->locked to be set, IOW, wait till > + * vCPU A run and unlock the osq lock. > + * NOTE that vCPU A and vCPU B might run on same physical cpu. > */ > - if (need_resched()) > + if (need_resched() || vcpu_is_preempted(old) || vcpu_has_preempted(vpc)) > goto unqueue; > the prev might change, so we need read node->prev every loop, then check vcpu preempted. > cpu_relax_lowlatency();
Powered by blists - more mailing lists