linux-kernel - Re: [PATCH] x86/PCI: Convert force_disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzZDLBKbDTbNr45b@feng-clx>
Date:   Fri, 30 Sep 2022 09:15:24 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Xiongfeng Wang <wangxiongfeng2@...wei.com>
CC:     Yu Liao <liaoyu15@...wei.com>, Zhang Rui <rui.zhang@...el.com>,
        "Thomas Gleixner" <tglx@...utronix.de>,
        Bjorn Helgaas <helgaas@...nel.org>,
        "Ingo Molnar" <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Kai-Heng Feng <kai.heng.feng@...onical.com>,
        <len.brown@...el.com>, "Xie XiuQi" <xiexiuqi@...wei.com>
Subject: Re: [PATCH] x86/PCI: Convert force_disable_hpet() to standard quirk

On Fri, Sep 30, 2022 at 09:05:24AM +0800, Xiongfeng Wang wrote:
> 
> 
> On 2022/9/30 8:38, Feng Tang wrote:
> > On Thu, Sep 29, 2022 at 11:52:28PM +0800, Yu Liao wrote:
> >> On 2020/12/2 15:28, Zhang Rui wrote:
> >>> On Mon, 2020-11-30 at 20:21 +0100, Thomas Gleixner wrote:
> >>>> Feng,
> >>>>
> >>>> On Fri, Nov 27 2020 at 14:11, Feng Tang wrote:
> >>>>> On Fri, Nov 27, 2020 at 12:27:34AM +0100, Thomas Gleixner wrote:
> >>>>>> On Thu, Nov 26 2020 at 09:24, Feng Tang wrote:
> >>>>>> Yes, that can happen. But OTOH, we should start to think about
> >>>>>> the
> >>>>>> requirements for using the TSC watchdog.
> >>>
> >>> My original proposal is to disable jiffies and refined-jiffies as the
> >>> clocksource watchdog, because they are not reliable and it's better to
> >>> use clocksource that has a hardware counter as watchdog, like the patch
> >>> below, which I didn't sent out for upstream.
> >>>
> >>> >From cf9ce0ecab8851a3745edcad92e072022af3dbd9 Mon Sep 17 00:00:00 2001
> >>> From: Zhang Rui <rui.zhang@...el.com>
> >>> Date: Fri, 19 Jun 2020 22:03:23 +0800
> >>> Subject: [RFC PATCH] time/clocksource: do not use refined-jiffies as watchdog
> >>>
> >>> On IA platforms, if HPET is disabled, either via x86 early-quirks, or
> >>> via kernel commandline, refined-jiffies will be used as clocksource
> >>> watchdog in early boot phase, before acpi_pm timer registered.
> >>>
> >>> This is not a problem if jiffies are accurate.
> >>> But in some cases, for example, when serial console is enabled, it may
> >>> take several milliseconds to write to the console, with irq disabled,
> >>> frequently. Thus many ticks may become longer than it should be.
> >>>
> >>> Using refined-jiffies as watchdog in this case breaks the system because
> >>> a) duration calculated by refined-jiffies watchdog is always consistent
> >>>    with the watchdog timeout issued using add_timer(), say, around 500ms.
> >>> b) duration calculated by the running clocksource, usually TSC on IA
> >>>    platforms, reflects the real time cost, which may be much larger.
> >>> This results in the running clocksource being disabled erroneously.
> >>>
> >>> This is reproduced on ICL because HPET is disabled in x86 early-quirks,
> >>> and also reproduced on a KBL and a WHL platform when HPET is disabled
> >>> via command line.
> >>>
> >>> BTW, commit fd329f276eca
> >>> ("x86/mtrr: Skip cache flushes on CPUs with cache self-snooping") is
> >>> another example that refined-jiffies causes the same problem when ticks
> >>> become slow for some other reason.
> >>
> >> Hi, Zhang Rui, we have met the same problem as you mentioned above. I have
> >> tested the following modification. It can solve the problem. Do you have plan
> >> to push it to upstream ?
> > 
> > Hi Liao Yu,
> > 
> > Could you provoide more details? Like, what ARCH is the platform (x86
> > or others), client or sever, if sever, how many sockets (2S/4S/8S)?
> > 
> > The error kernel log will also be helpful.
> 
> Hi, Feng Tang,
> 
> It's a X86 Sever. lscpu print the following information:
> 
> Architecture:                    x86_64
> CPU op-mode(s):                  32-bit, 64-bit
> Byte Order:                      Little Endian
> Address sizes:                   46 bits physical, 48 bits virtual
> CPU(s):                          224
> On-line CPU(s) list:             0-223
> Thread(s) per core:              2
> Core(s) per socket:              28
> Socket(s):                       4
> NUMA node(s):                    4
> Vendor ID:                       GenuineIntel
> CPU family:                      6
> Model:                           85
> Model name:                      Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> Stepping:                        4
> CPU MHz:                         3199.379
> CPU max MHz:                     3800.0000
> CPU min MHz:                     1000.0000
> BogoMIPS:                        5000.00
> Virtualization:                  VT-x
> L1d cache:                       3.5 MiB
> L1i cache:                       3.5 MiB
> L2 cache:                        112 MiB
> L3 cache:                        154 MiB
> NUMA node0 CPU(s):               0-27,112-139
> NUMA node1 CPU(s):               28-55,140-167
> NUMA node2 CPU(s):               56-83,168-195
> NUMA node3 CPU(s):               84-111,196-223
> 
> Part of the kernel log is as follows.
> 
> [    1.144402] smp: Brought up 4 nodes, 224 CPUs
> [    1.144402] smpboot: Max logical packages: 4
> [    1.144402] smpboot: Total of 224 processors activated (1121097.93 BogoMIPS)
> [    1.520003] clocksource: timekeeping watchdog on CPU2: Marking clocksource
> 'tsc-early' as unstable because the skew is too large:
> [    1.520010] clocksource:                       'refined-jiffies' wd_now:
> fffb7210 wd_last: fffb7018 mask: ffffffff
> [    1.520013] clocksource:                       'tsc-early' cs_now:
> 6606717afddd0 cs_last: 66065eff88ad4 mask: ffffffffffffffff
> [    1.520015] tsc: Marking TSC unstable due to clocksource watchdog
> [    5.164635] node 0 initialised, 98233092 pages in 4013ms
> [    5.209294] node 3 initialised, 98923232 pages in 4057ms
> [    5.220001] node 2 initialised, 99054870 pages in 4068ms
> [    5.222282] node 1 initialised, 99054870 pages in 4070ms

Thanks Xiaofeng for the info.

Could you try the below patch? It is kinda extension of 

b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms") 

which I have run limited test on some 4 sockets Haswell and Cascadelake
AP x86 servers.


Thanks,
Feng
---
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..b4ea79cb1d1a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1217,7 +1217,7 @@ static void __init check_system_tsc_reliable(void)
 	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
 	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
-	    nr_online_nodes <= 2)
+	    nr_online_nodes <= 8)
 		tsc_disable_clocksource_watchdog();
 }