linux-kernel - Re: [PATCH] locking/osq: Drop the overload of osq lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <64798940-BED4-45F5-B0D3-F7D4238E4C61@linux.vnet.ibm.com>
Date:	Sun, 26 Jun 2016 23:54:48 +0800
From:	panxinhui <xinhui@...ux.vnet.ibm.com>
To:	Boqun Feng <boqun.feng@...il.com>
Cc:	panxinhui <xinhui@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Pan Xinhui <xinhui.pan@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org, mingo@...hat.com, dave@...olabs.net,
	will.deacon@....com, Waiman.Long@....com, benh@...nel.crashing.org
Subject: Re: [PATCH] locking/osq: Drop the overload of osq lock


> 在 2016年6月26日，22:11，Boqun Feng <boqun.feng@...il.com> 写道：
> 
> On Sun, Jun 26, 2016 at 02:58:00PM +0800, panxinhui wrote:
> [snip]
>>>> 
>>> 
>>> Unfortunately, on PPC, we compile pseries code along with powernv code
>>> in a kernel binary, therefore we need to wire the proper primitives at
>>> runtime.
>>> 
>> no, we can use same vcpu preempted check ops actually.
>> just like how arch_spin_lock() does.
>> 
>> if (SHARED_PROCESSOR)
>>                                __spin_yield(lock); 
>> 
>> so we can
>> u32 arch_get_vcpu_yield_count(int cpu)
>> {
>> 	if (SHARED_PROCESSOR)
> 
> SHARED_PROCESSOR is true only if a PPC_PSERIES kernel runs in shared
> processor mode. Using this here will rule out the situation where a
> PPC_PSERIES kernel(ppc guest) runs indedicated processor mode. I don’t
> think we can rule out the dedicated processor mode, because even in the
well, copied from the codes

/*
 * We are using a non architected field to determine if a partition is
 * shared or dedicated. This currently works on both KVM and PHYP, but
 * we will have to transition to something better.
 */
#define LPPACA_OLD_SHARED_PROC          2

static inline bool lppaca_shared_proc(struct lppaca *l)
{
        return !!(l->__old_status & LPPACA_OLD_SHARED_PROC);
}
 
OKay, I have no idea about the difference between dedicated processor mode and the so-called shared..
Let me talk to you f2f monday….


> dedicated mode, the lock holder may still be preempted. So why? Why do
> we only want the yield_count in the shared processor mode?
> 
>>                                return be32_to_cpu(lppaca_of(cpu).yield_count;
>> 	return 0;
>> }
>> 
>> and all these things can be don inline then.
>> 
>>> I've cooked the following patch, please note I haven't test this, this
>>> is just a POC.
>>> 
>>> Regards,
>>> Boqun
>>> 
>>> ---------------->8
>>> virt: Introduce vcpu preemption detection functions
>>> 
>>> Lock holder preemption is an serious problem in virtualized environment
>>> for locking. Different archs and hypervisors employ different ways to
>>> try to solve the problem in different locking mechanisms. And recently
>>> Pan Xinhui found a significant performance drop in osq_lock(), which
>>> could be solved by providing a mechanism to detect the preemption of
>>> vcpu.
>>> 
>>> Therefore this patch introduces several vcpu preemption detection
>>> primitives, which allows locking primtives to save the time of
>>> unnecesarry spinning. These primitives act as an abstract layer between
>>> locking functions and arch or hypervisor related code.
>>> 
>>> There are two sets of primitives:
>>> 
>>> 1.	vcpu_preempt_count() and vcpu_has_preempted(), they must be used
>>> 	pairwisely in a same preempt disable critical section. And they
>>> 	could detect whether a vcpu preemption happens between them.
>>> 
>>> 2.	vcpu_is_preempted(), used to check whether other cpu's vcpu is
>>> 	preempted.
>>> 
>>> This patch also implements those primitives on pseries and wire them up.
>>> 
>>> Signed-off-by: Boqun Feng <boqun.feng@...il.com>
>>> ---
>>> arch/powerpc/platforms/pseries/Kconfig |  1 +
>>> arch/powerpc/platforms/pseries/setup.c | 28 +++++++++++
>>> include/linux/vcpu_preempt.h           | 90 ++++++++++++++++++++++++++++++++++
>>> 3 files changed, 119 insertions(+)
>>> create mode 100644 include/linux/vcpu_preempt.h
>>> 
>>> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
>>> index bec90fb30425..7d24c3e48878 100644
>>> --- a/arch/powerpc/platforms/pseries/Kconfig
>>> +++ b/arch/powerpc/platforms/pseries/Kconfig
>>> @@ -21,6 +21,7 @@ config PPC_PSERIES
>>> 	select HOTPLUG_CPU if SMP
>>> 	select ARCH_RANDOM
>>> 	select PPC_DOORBELL
>>> +	select HAS_VCPU_PREEMPTION_DETECTION
>>> 	default y
>>> 
>> it is already too many config there…
> 
> Why is this a problem? It's a Kconfig file, there are not too many
> readability issues here. It simply serve as a database for the
> relationships between different config options. menuconfig and its
just a personal favor…

I also a little hate the callback ops if it is not necessary to use, as it breaks my reading of the codes :-
AND same reason, I don’t want to vim .config then /any config_

anyway, this is not a big problem…


> friends can do their job well. BTW, have you read config X86? ;-)
> 
>> as I comment above, we could make these things inline, so looks like we needn't to add such config.
>> 
> 
> Another reason, I didn't make those primitives arch-specific inline
> functions, is that they are actually not only arch-specific but also
> (virtual-)environment-specific, or hypervisor-specific, that is their
> implementations differ when the kernel runs on different hypervisors(KVM
> and Xen, for example).
> 
> If you make the implementation part inline functions, you end up
> handling different virtual environment every time the functions are
> called, which is a waste of time. We are lucky because phyp and kvm
well, the call is also expensive.

> share the same interface of yield count, but this might not be true for
> other archs and other hypervisors.
> 

fair enough.

>>> config PPC_SPLPAR
>>> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
>>> index 9883bc7ea007..5d4aed54e039 100644
>>> --- a/arch/powerpc/platforms/pseries/setup.c
>>> +++ b/arch/powerpc/platforms/pseries/setup.c
>>> @@ -42,6 +42,7 @@
>>> #include <linux/of.h>
>>> #include <linux/of_pci.h>
>>> #include <linux/kexec.h>
>>> +#include <linux/vcpu_preempt.h>
>>> 
>>> #include <asm/mmu.h>
>>> #include <asm/processor.h>
>>> @@ -501,6 +502,31 @@ static void __init find_and_init_phbs(void)
>>> 	of_pci_check_probe_only();
>>> }
>>> 
>>> +struct vcpu_preempt_ops vcpu_preempt_ops = DEFAULT_VCPU_PREEMPT_OPS;
>>> +EXPORT_SYMBOL(vcpu_preempt_ops);
>>> +
>>> +static long pseries_vcpu_preempt_count(void)
>>> +{
>>> +	return be32_to_cpu(get_lppaca()->yield_count);
>>> +}
>>> +
>>> +static bool pseries_vcpu_is_preempted(int cpu)
>>> +{
>>> +	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
>>> +}
>>> +
>>> +static bool pseries_vcpu_has_preempted(long vpc)
>>> +{
>>> +	return pseries_vcpu_preempt_count() != vpc;
>>> +}
>>> +
>>> +static void __init pseries_setup_vcpu_preempt_ops(void)
>>> +{
>>> +	vcpu_preempt_ops.preempt_count = pseries_vcpu_preempt_count;
>>> +	vcpu_preempt_ops.is_preempted = pseries_vcpu_is_preempted;
>>> +	vcpu_preempt_ops.has_preempted = pseries_vcpu_has_preempted;
>>> +}
>>> +
>>> static void __init pSeries_setup_arch(void)
>>> {
>>> 	set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
>>> @@ -549,6 +575,8 @@ static void __init pSeries_setup_arch(void)
>>> 				"%ld\n", rc);
>>> 		}
>>> 	}
>>> +
>>> +	pseries_setup_vcpu_preempt_ops();
>>> }
>>> 
>>> static int __init pSeries_init_panel(void)
>>> diff --git a/include/linux/vcpu_preempt.h b/include/linux/vcpu_preempt.h
>>> new file mode 100644
>>> index 000000000000..4a88414bb83f
>>> --- /dev/null
>>> +++ b/include/linux/vcpu_preempt.h
>> 
>> How about add them in sched.h
> 
> I don't have a strong opinion on which header file to use. TBH, I was
> trying to use some header files for parvirt stuff, but failed to find
> one, so I add a new one because I don't want people confused between
> preemption of vcpus and preemption of tasks. But once again, I don't
> have a strong opinion ;-)
> 
>> I just think  they are similar. correct me if  I got a mistake :)
>> 
>> thanks
>> xinhui
>> 
>>> @@ -0,0 +1,90 @@
>>> +/*
>>> + * Primitives for checking the vcpu preemption from the guest.
>>> + */
>>> +
>>> +static long __vcpu_preempt_count(void)
>>> +{
>>> +	return 0;
>>> +}
>>> +
>>> +static bool __vcpu_has_preempted(long vpc)
>>> +{
>>> +	return false;
>>> +}
>>> +
>>> +static bool __vcpu_is_preempted(int cpu)
>>> +{
>>> +	return false;
>>> +}
>>> +
>>> +struct vcpu_preempt_ops {
>>> +	/*
>>> +	 * Get the current vcpu's "preempt count", which is going to use for
>>> +	 * checking whether the current vcpu has ever been preempted
>>> +	 */
>>> +	long (*preempt_count)(void);
>>> +

>>> 
>>> +	/*
>>> +	 * Return whether a vcpu is preempted
>>> +	 */
>>> +	bool (*is_preempted)(int cpu);
>>> +
>>> +	/*
>>> +	 * Given a "vcpu preempt count", Return whether a vcpu preemption ever
>>> +	 * happened after the .preempt_count() was called.
>>> +	 */
>>> +	bool (*has_preempted)(long vpc);
>>> +};
>>> +
>>> +struct vcpu_preempt_ops vcpu_preempt_ops;
>>> +
>>> +/* Default boilerplate */
>>> +#define DEFAULT_VCPU_PREEMPT_OPS			\
>>> +	{						\
>>> +		.preempt_count = __vcpu_preempt_count,	\
>>> +		.is_preempted = __vcpu_is_preempted,	\
>>> +		.has_preempted = __vcpu_has_preempted	\
>>> +	}
>>> +
>>> +#ifdef CONFIG_HAS_VCPU_PREEMPTION_DETECTION
>>> +/*
>>> + * vcpu_preempt_count: Get the current cpu's "vcpu preempt count"(vpc).
>>> + *
>>> + * The vpc is used for checking whether the current vcpu has ever been
>>> + * preempted via vcpu_has_preempted().
>>> + *
>>> + * This function and vcpu_has_preepmted() should be called in the same
>>> + * preemption disabled critical section.
>>> + */
>>> +static long vcpu_preempt_count(void)
>>> +{
>>> +	return vcpu_preempt_ops.preempt_count();
>>> +}
>>> +
>>> +/*
>>> + * vcpu_is_preempted: Check whether @cpu's vcpu is preempted.
>>> + */
>>> +static bool vcpu_is_preempted(int cpu)
>>> +{
>>> +	return vcpu_preempt_ops.is_preempted(cpu);
>>> +}
>>> +
>>> +/*
>>> + * vcpu_has_preepmted: Check whether the current cpu's vcpu has ever been
>>> + * preempted.
>>> + *
>>> + * The checked duration is between the vcpu_preempt_count() which returns @vpc
>>> + * is called and this function called.
>>> + *
>>> + * This function and corresponding vcpu_preempt_count() should be in the same
>>> + * preemption disabled cirtial section.
>>> + */
>>> +static bool vcpu_has_preempted(long vpc)
>>> +{
>>> +	return vcpu_preempt_ops.has_preempt(vpc);
>>> +}
>>> +#else /* CONFIG_HAS_VCPU_PREEMPTION_DETECTION */
>>> +#define vcpu_preempt_count() __vcpu_preempt_count()
>>> +#define vcpu_is_preempted(cpu) __vcpu_is_preempted(cpu)
>>> +#define vcpu_has_preempted(vpc) __vcpu_has_preempted(vpc)
>>> +#endif /* CONFIG_HAS_VCPU_PREEPMTION_DETECTION */
>>> -- 
>>> 2.9.0
>>> 
>>