linux-kernel - Re: [PATCH v2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y1DSLnfMakCrWux/@feng-clx>
Date:   Thu, 20 Oct 2022 12:44:30 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "Dave Hansen" <dave.hansen@...el.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        "Peter Zijlstra" <peterz@...radead.org>, <x86@...nel.org>,
        <linux-kernel@...r.kernel.org>, <rui.zhang@...el.com>,
        <tim.c.chen@...el.com>, Xiongfeng Wang <wangxiongfeng2@...wei.com>,
        Yu Liao <liaoyu15@...wei.com>
Subject: Re: [PATCH v2] x86/tsc: Extend watchdog check exemption to 4-Sockets
 platform

On Wed, Oct 19, 2022 at 11:18:43AM +0200, Thomas Gleixner wrote:
> On Thu, Oct 13 2022 at 21:12, Feng Tang wrote:
> > There is report again that the tsc clocksource on a 4 sockets x86
> > Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog,
> > and disabled [1].
> >
> > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> > on qualified platorms") was introduce to deal with these false
> > alarms of tsc unstable issues, covering qualified platforms for 2
> > sockets or smaller ones.
> >
> > Extend the exemption to 4 sockets to fix the issue.
> >
> > We also got similar reports on 8 sockets platform from internal test,
> > but as Peter pointed out, there was tsc sync issues for 8-sockets
> > platform, and it'd better be handled architecture by architecture,
> > instead of directly changing the threshold to 8 here.
> >
> > Rui also proposed another way to disable 'jiffies' as clocksource
> > watchdog [2], which can also solve this specific problem in an
> > architecture independent way, with one limitation that some tsc false
> > alarms are reported by other watchdogs like HPET in post-boot time,
> > while 'jiffies' is mostly used in boot phase before hardware
> > clocksources are initialized.
> 
> HPET is initialized early, but if HPET is disabled or not advertised
> then the only other hardware clocksource is PMTIMER which is initialized
> late via fs_initcall. PMTIMER is initialized late due to broken Pentium
> era chipsets which are sorted with PCI quirks. For anything else we can
> initialize it early. Something like the below.

Thanks for sharing the background and the code! It can reduce the
time of 'jiffies' being a watchdog on client platforms whose HPET
are disabled. And there were still false positive reports for
HPET/PMTIMER as watchdogs, so I still vote to your suggestion of
lifting the check for qualified platforms.

For that, Dave raised the accuracy issue of 'nr_online_nodes' and
we proposed new patch in https://lore.kernel.org/lkml/20221017132942.1646934-1-feng.tang@intel.com/
while the topology_max_packages() still has issue as providing socket
number, and I plan to use 'logical_packages' instead. Do you think
it's in the right direction?

> I'm sure I said this more than once, but I'm happy to repeat myself
> forever:
> 
>   Instead of proliferating lousy hacks, can the X86 vendors finaly get
>   their act together and provide some architected information whether
>   the TSC is trustworthy or not?
 
Yes it will save us a lot of trouble. Maybe better in CPUID info, as
if there is some bug in HW/BIOS, it may get fixed with microcode update.

Thanks,
Feng

> Thanks,
> 
>         tglx
> ---
> 
> --- a/arch/x86/kernel/time.c
> +++ b/arch/x86/kernel/time.c
> @@ -10,6 +10,7 @@
>   *
>   */
>  
> +#include <linux/acpi_pmtmr.h>
>  #include <linux/clocksource.h>
>  #include <linux/clockchips.h>
>  #include <linux/interrupt.h>
> @@ -75,6 +76,14 @@ static void __init setup_default_timer_i
>  void __init hpet_time_init(void)
>  {
>  	if (!hpet_enable()) {
> +		/*
> +		 * Some Pentium chipsets have broken HPETs and need
> +		 * PCI quirks to run before init.
> +		 */
> +		if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> +		    boot_cpu_data.family != 5)
> +			init_acpi_pm_clocksource();
> +
>  		if (!pit_timer_init())
>  			return;
>  	}
> --- a/drivers/clocksource/acpi_pm.c
> +++ b/drivers/clocksource/acpi_pm.c
> @@ -30,6 +30,7 @@
>   * in arch/i386/kernel/acpi/boot.c
>   */
>  u32 pmtmr_ioport __read_mostly;
> +static bool pmtmr_initialized __init_data;
>  
>  static inline u32 read_pmtmr(void)
>  {
> @@ -142,7 +143,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SE
>   * Some boards have the PMTMR running way too fast. We check
>   * the PMTMR rate against PIT channel 2 to catch these cases.
>   */
> -static int verify_pmtmr_rate(void)
> +static int __init verify_pmtmr_rate(void)
>  {
>  	u64 value1, value2;
>  	unsigned long count, delta;
> @@ -172,14 +173,18 @@ static int verify_pmtmr_rate(void)
>  /* Number of reads we try to get two different values */
>  #define ACPI_PM_READ_CHECKS 10000
>  
> -static int __init init_acpi_pm_clocksource(void)
> +int __init init_acpi_pm_clocksource(void)
>  {
>  	u64 value1, value2;
>  	unsigned int i, j = 0;
> +	int ret;
>  
>  	if (!pmtmr_ioport)
>  		return -ENODEV;
>  
> +	if (pmtmr_initialized)
> +		return 0;
> +
>  	/* "verify" this timing source: */
>  	for (j = 0; j < ACPI_PM_MONOTONICITY_CHECKS; j++) {
>  		udelay(100 * j);
> @@ -210,10 +215,11 @@ static int __init init_acpi_pm_clocksour
>  		return -ENODEV;
>  	}
>  
> -	return clocksource_register_hz(&clocksource_acpi_pm,
> -						PMTMR_TICKS_PER_SEC);
> +	ret = clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC);
> +	if (!ret)
> +		pmtimer_initialized = true;
> +	return ret;
>  }
> -
>  /* We use fs_initcall because we want the PCI fixups to have run
>   * but we still need to load before device_initcall
>   */
> --- a/include/linux/acpi_pmtmr.h
> +++ b/include/linux/acpi_pmtmr.h
> @@ -13,6 +13,8 @@
>  /* Overrun value */
>  #define ACPI_PM_OVRRUN	(1<<24)
>  
> +extern int __init init_acpi_pm_clocksource(void);
> +
>  #ifdef CONFIG_X86_PM_TIMER
>  
>  extern u32 acpi_pm_read_verified(void);
> 
>