lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 06 Jun 2022 20:50:01 +0800
From:   Zhang Rui <rui.zhang@...el.com>
To:     Wyes Karny <wyes.karny@....com>, linux-kernel@...r.kernel.org
Cc:     Lewis.Carroll@....com, Mario.Limonciello@....com,
        gautham.shenoy@....com, Ananth.Narayan@....com, bharata@....com,
        len.brown@...el.com, x86@...nel.org, tglx@...utronix.de,
        mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, peterz@...radead.org, chang.seok.bae@...el.com,
        keescook@...omium.org, metze@...ba.org, zhengqi.arch@...edance.com,
        mark.rutland@....com, puwen@...on.cn, rafael.j.wysocki@...el.com,
        andrew.cooper3@...rix.com, jing2.liu@...el.com,
        jmattson@...gle.com, pawan.kumar.gupta@...ux.intel.com
Subject: Re: [PATCH v4 2/3] x86: Remove vendor checks from
 prefer_mwait_c1_over_halt

On Mon, 2022-05-23 at 22:25 +0530, Wyes Karny wrote:
> Remove vendor checks from prefer_mwait_c1_over_halt function. Restore
> the decision tree to support MWAIT C1 as the default idle state based
> on
> CPUID checks as done by Thomas Gleixner in
> commit 09fd4b4ef5bc ("x86: use cpuid to check MWAIT support for C1")
> 
> The decision tree is removed in
> commit 69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait"
> cmdline param")
> 
> Prefer MWAIT when the following conditions are satisfied:
>     1. CPUID_Fn00000001_ECX [Monitor] should be set
>     2. CPUID_Fn00000005 should be supported
>     3. If CPUID_Fn00000005_ECX [EMX] is set then there should be
>        at least one C1 substate available, indicated by
>        CPUID_Fn00000005_EDX [MWaitC1SubStates] bits.
> 
> Otherwise use HLT for default_idle function.
> 
> HPC customers who want to optimize for lower latency are known to
> disable Global C-States in the BIOS. In fact, some vendors allow
> choosing a BIOS 'performance' profile which explicitly disables
> C-States.  In this scenario, the cpuidle driver will not be loaded
> and
> the kernel will continue with the default idle state chosen at boot
> time. On AMD systems currently the default idle state is HLT which
> has
> a higher exit latency compared to MWAIT.
> 
> The reason for the choice of HLT over MWAIT on AMD systems is:
> 
> 1. Families prior to 10h didn't support MWAIT
> 2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
>    preferable to use HLT as the default state on these systems.
> 
> However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1.
> And
> it is preferable to use MWAIT as the default idle state on these
> systems, as it has lower exit latencies.
> 
> The below table represents the exit latency for HLT and MWAIT on AMD
> Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to
> other CPU and measuring how many clock cycles it took to
> wakeup.  Each
> iteration measures 10K wakeups by pinning source and destination.
> 
> HLT:
> 
> 25.0000th percentile  :      1900 ns
> 50.0000th percentile  :      2000 ns
> 75.0000th percentile  :      2300 ns
> 90.0000th percentile  :      2500 ns
> 95.0000th percentile  :      2600 ns
> 99.0000th percentile  :      2800 ns
> 99.5000th percentile  :      3000 ns
> 99.9000th percentile  :      3400 ns
> 99.9500th percentile  :      3600 ns
> 99.9900th percentile  :      5900 ns
>   Min latency         :      1700 ns
>   Max latency         :      5900 ns
> Total Samples      9999
> 
> MWAIT:
> 
> 25.0000th percentile  :      1400 ns
> 50.0000th percentile  :      1500 ns
> 75.0000th percentile  :      1700 ns
> 90.0000th percentile  :      1800 ns
> 95.0000th percentile  :      1900 ns
> 99.0000th percentile  :      2300 ns
> 99.5000th percentile  :      2500 ns
> 99.9000th percentile  :      3200 ns
> 99.9500th percentile  :      3500 ns
> 99.9900th percentile  :      4600 ns
>   Min latency         :      1200 ns
>   Max latency         :      4600 ns
> Total Samples      9997
> 
> Improvement (99th percentile): 21.74%
> 
> Below is another result for context_switch2 micro-benchmark, which
> brings out the impact of improved wakeup latency through increased
> context-switches per second.
> 
> Link: https://ozlabs.org/~anton/junkcode/context_switch2.c
> 
> with HLT:
> -------------------------------
> 50.0000th percentile  :  190184
> 75.0000th percentile  :  191032
> 90.0000th percentile  :  192314
> 95.0000th percentile  :  192520
> 99.0000th percentile  :  192844
> MIN  :  190148
> MAX  :  192852
> 
> with MWAIT:
> -------------------------------
> 50.0000th percentile  :  277444
> 75.0000th percentile  :  278268
> 90.0000th percentile  :  278888
> 95.0000th percentile  :  279164
> 99.0000th percentile  :  280504
> MIN  :  273278
> MAX  :  281410
> 
> Improvement(99th percentile): ~ 45.46%
> 
> Signed-off-by: Wyes Karny <wyes.karny@....com>

I couldn't evaluate the impact to other vendors, but at least for Intel
platforms,

Test-by: Zhang Rui <rui.zhang@...el.com>

> ---
>  arch/x86/include/asm/mwait.h |  1 +
>  arch/x86/kernel/process.c    | 35 +++++++++++++++++++++++++---------
> -
>  2 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mwait.h
> b/arch/x86/include/asm/mwait.h
> index 29dd27b5a339..3a8fdf881313 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -13,6 +13,7 @@
>  #define MWAIT_SUBSTATE_SIZE		4
>  #define MWAIT_HINT2CSTATE(hint)		(((hint) >>
> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK)
>  #define MWAIT_HINT2SUBSTATE(hint)	((hint) & MWAIT_CSTATE_MASK)
> +#define MWAIT_C1_SUBSTATE_MASK  0xf0
>  
>  #define CPUID_MWAIT_LEAF		5
>  #define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 4e0178b066c5..7bf4d73c9522 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -813,28 +813,43 @@ static void amd_e400_idle(void)
>  }
>  
>  /*
> - * Intel Core2 and older machines prefer MWAIT over HALT for C1.
> - * We can't rely on cpuidle installing MWAIT, because it will not
> load
> - * on systems that support only C1 -- so the boot default must be
> MWAIT.
> + * Prefer MWAIT over HALT if MWAIT is supported, MWAIT_CPUID leaf
> + * exists and whenever MONITOR/MWAIT extensions are present there is
> at
> + * least one C1 substate.
>   *
> - * Some AMD machines are the opposite, they depend on using HALT.
> - *
> - * So for default C1, which is used during boot until cpuidle loads,
> - * use MWAIT-C1 on Intel HW that has it, else use HALT.
> + * Do not prefer MWAIT if MONITOR instruction has a bug or
> idle=nomwait
> + * is passed to kernel commandline parameter.
>   */
>  static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>  {
> +	u32 eax, ebx, ecx, edx;
> +
>  	/* User has disallowed the use of MWAIT. Fallback to HALT */
>  	if (boot_option_idle_override == IDLE_NOMWAIT)
>  		return 0;
>  
> -	if (c->x86_vendor != X86_VENDOR_INTEL)
> +	/* MWAIT is not supported on this platform. Fallback to HALT */
> +	if (!cpu_has(c, X86_FEATURE_MWAIT))
>  		return 0;
>  
> -	if (!cpu_has(c, X86_FEATURE_MWAIT) ||
> boot_cpu_has_bug(X86_BUG_MONITOR))
> +	/* Monitor has a bug. Fallback to HALT */
> +	if (boot_cpu_has_bug(X86_BUG_MONITOR))
>  		return 0;
>  
> -	return 1;
> +	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
> +
> +	/*
> +	 * If MWAIT extensions are not available, it is safe to use
> MWAIT
> +	 * with EAX=0, ECX=0.
> +	 */
> +	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED))
> +		return 1;
> +
> +	/*
> +	 * If MWAIT extensions are available, there should be at least
> one
> +	 * MWAIT C1 substate present.
> +	 */
> +	return (edx & MWAIT_C1_SUBSTATE_MASK);
>  }
>  
>  /*

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ