lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 2 Jun 2023 11:00:55 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...el.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        linux-kernel@...r.kernel.org, rui.zhang@...el.com,
        tim.c.chen@...el.com, Xiongfeng Wang <wangxiongfeng2@...wei.com>,
        liaoyu15@...wei.com
Subject: Re: [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to
 4-Sockets platform

On Fri, Oct 21, 2022 at 02:21:31PM +0800, Feng Tang wrote:
> There is report again that the tsc clocksource on a 4 sockets x86
> Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog,
> and disabled [1].
> 
> Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> on qualified platorms") was introduce to deal with these false
> alarms of tsc unstable issues, covering qualified platforms for 2
> sockets or smaller ones.
> 
> Extend the exemption to 4 sockets to fix the issue.
> 
> We also got similar reports on 8 sockets platform from internal test,
> but as Peter pointed out, there was tsc sync issues for 8-sockets
> platform, and it'd better be handled architecture by architecture,
> instead of directly changing the threshold to 8 here.
> 
> Rui also proposed another way to disable 'jiffies' as clocksource
> watchdog [2], which can also solve this specific problem in an
> architecture independent way, with one limitation that there are
> also some tsc false alarms which were reported by other hardware
> watchdogs like HPET/PMTIMER, while 'jiffies' watchdog is mostly
> used in kernel boot phase.
> 
> [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/
> [2]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/
> 
> Reported-by: Yu Liao <liaoyu15@...wei.com>
> Signed-off-by: Feng Tang <feng.tang@...el.com>

We have a number of four-socket systems whose TSCs seem to be reliable.
We do see issues where high memory load forces the TSC to be marked
unstable, but that is because those systems are using an older kernel.

If the TSCs do start to misbehave, I will of course let you all know.
But in the meantime:

Reviewed-by: Paul E. McKenney <paulmck@...nel.org>

The previous patch that changes the definition of "socket" I have no
opinion on.  I must let you guys work that out.  However, I do note that
this patch can be rebased so as to no longer depend on that patch.

						Thanx, Paul

> ---
>  arch/x86/kernel/tsc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 178448ef00c7..356f06287034 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1400,7 +1400,7 @@ static int __init init_tsc_clocksource(void)
>  	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>  	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
>  	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> -	    logical_packages <= 2)
> +	    logical_packages <= 4)
>  		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
>  
>  	/*
> -- 
> 2.34.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ