linux-kernel - Re: [PATCH -v5 5/5] x86,smp: limit spinlock delay on virtual machines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5113ABEC.6050906@redhat.com>
Date:	Thu, 07 Feb 2013 08:28:12 -0500
From:	Rik van Riel <riel@...hat.com>
To:	Stefano Stabellini <stefano.stabellini@...citrix.com>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"aquini@...hat.com" <aquini@...hat.com>,
	"eric.dumazet@...il.com" <eric.dumazet@...il.com>,
	"lwoodman@...hat.com" <lwoodman@...hat.com>,
	"knoel@...hat.com" <knoel@...hat.com>,
	"chegu_vinod@...com" <chegu_vinod@...com>,
	"raghavendra.kt@...ux.vnet.ibm.com" 
	<raghavendra.kt@...ux.vnet.ibm.com>,
	"mingo@...hat.com" <mingo@...hat.com>
Subject: Re: [PATCH -v5 5/5] x86,smp: limit spinlock delay on virtual machines

On 02/07/2013 06:25 AM, Stefano Stabellini wrote:
> On Wed, 6 Feb 2013, Rik van Riel wrote:
>> Modern Intel and AMD CPUs will trap to the host when the guest
>> is spinning on a spinlock, allowing the host to schedule in
>> something else.
>>
>> This effectively means the host is taking care of spinlock
>> backoff for virtual machines. It also means that doing the
>> spinlock backoff in the guest anyway can lead to totally
>> unpredictable results, extremely large backoffs, and
>> performance regressions.
>>
>> To prevent those problems, we limit the spinlock backoff
>> delay, when running in a virtual machine, to a small value.
>>
>> Signed-off-by: Rik van Riel <riel@...hat.com>
>> ---
>>   arch/x86/include/asm/processor.h |    2 ++
>>   arch/x86/kernel/cpu/hypervisor.c |    2 ++
>>   arch/x86/kernel/smp.c            |   21 +++++++++++++++++++--
>>   3 files changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 888184b..4118fd8 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -997,6 +997,8 @@ extern bool cpu_has_amd_erratum(const int *);
>>   extern unsigned long arch_align_stack(unsigned long sp);
>>   extern void free_init_pages(char *what, unsigned long begin, unsigned long end);
>>
>> +extern void init_guest_spinlock_delay(void);
>> +
>>   void default_idle(void);
>>   bool set_pm_idle_to_default(void);
>>
>> diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
>> index a8f8fa9..4a53724 100644
>> --- a/arch/x86/kernel/cpu/hypervisor.c
>> +++ b/arch/x86/kernel/cpu/hypervisor.c
>> @@ -76,6 +76,8 @@ void __init init_hypervisor_platform(void)
>>
>>   	init_hypervisor(&boot_cpu_data);
>>
>> +	init_guest_spinlock_delay();
>> +
>>   	if (x86_hyper->init_platform)
>>   		x86_hyper->init_platform();
>>   }
>> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
>> index 64e33ef..fbc5ff3 100644
>> --- a/arch/x86/kernel/smp.c
>> +++ b/arch/x86/kernel/smp.c
>> @@ -116,8 +116,25 @@ static bool smp_no_nmi_ipi = false;
>>   #define DELAY_SHIFT			8
>>   #define DELAY_FIXED_1			(1<<DELAY_SHIFT)
>>   #define MIN_SPINLOCK_DELAY		(1 * DELAY_FIXED_1)
>> -#define MAX_SPINLOCK_DELAY		(16000 * DELAY_FIXED_1)
>> +#define MAX_SPINLOCK_DELAY_NATIVE	(16000 * DELAY_FIXED_1)
>> +#define MAX_SPINLOCK_DELAY_GUEST	(16 * DELAY_FIXED_1)
>>   #define DELAY_HASH_SHIFT		6
>> +
>> +/*
>> + * Modern Intel and AMD CPUs tell the hypervisor when a guest is
>> + * spinning excessively on a spinlock. The hypervisor will then
>> + * schedule something else, effectively taking care of the backoff
>> + * for us. Doing our own backoff on top of the hypervisor's pause
>> + * loop exit handling can lead to excessively long delays, and
>> + * performance degradations. Limit the spinlock delay in virtual
>> + * machines to a smaller value. Called from init_hypervisor_platform
>> + */
>> +static int __read_mostly max_spinlock_delay = MAX_SPINLOCK_DELAY_NATIVE;
>> +void __init init_guest_spinlock_delay(void)
>> +{
>> +	max_spinlock_delay = MAX_SPINLOCK_DELAY_GUEST;
>> +}
>> +
>
> Same comment as last time:
>
> """
> Before reducing max_spinlock_delay, shouldn't we check that PAUSE-loop
> exiting is available? What if we are running on an older x86 machine
> that doesn't support it?
> """

I don't think this will be much of an issue.  If we are
in a CPU overcommit scenario, the spinlock overhead will
be dominated by host scheduling latencies, not by the
normal spinlock hold time or cache line bouncing.

If we are not in an overcommit scenario, limiting the
spinlock delay will result in a guest only seeing a
small performance boost from these patches, instead of
a larger one.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/