lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzbBOuX37XpUiP4y@feng-clx>
Date:   Fri, 30 Sep 2022 18:13:14 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Yu Liao <liaoyu15@...wei.com>
CC:     Xiongfeng Wang <wangxiongfeng2@...wei.com>,
        Zhang Rui <rui.zhang@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Bjorn Helgaas <helgaas@...nel.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
        "Bjorn Helgaas" <bhelgaas@...gle.com>,
        Kai-Heng Feng <kai.heng.feng@...onical.com>,
        <len.brown@...el.com>, Xie XiuQi <xiexiuqi@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH] x86/PCI: Convert force_disable_hpet() to standard quirk

On Fri, Sep 30, 2022 at 05:45:29PM +0800, Yu Liao wrote:
[...]
> >>>>
> >>>> Hi, Zhang Rui, we have met the same problem as you mentioned above. I have
> >>>> tested the following modification. It can solve the problem. Do you have plan
> >>>> to push it to upstream ?
> >>>
> >>> Hi Liao Yu,
> >>>
> >>> Could you provoide more details? Like, what ARCH is the platform (x86
> >>> or others), client or sever, if sever, how many sockets (2S/4S/8S)?
> >>>
> >>> The error kernel log will also be helpful.
> >>
> >> Hi, Feng Tang,
> >>
> >> It's a X86 Sever. lscpu print the following information:
> >>
> >> Architecture:                    x86_64
> >> CPU op-mode(s):                  32-bit, 64-bit
> >> Byte Order:                      Little Endian
> >> Address sizes:                   46 bits physical, 48 bits virtual
> >> CPU(s):                          224
> >> On-line CPU(s) list:             0-223
> >> Thread(s) per core:              2
> >> Core(s) per socket:              28
> >> Socket(s):                       4
> >> NUMA node(s):                    4
> >> Vendor ID:                       GenuineIntel
> >> CPU family:                      6
> >> Model:                           85
> >> Model name:                      Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> >> Stepping:                        4
> >> CPU MHz:                         3199.379
> >> CPU max MHz:                     3800.0000
> >> CPU min MHz:                     1000.0000
> >> BogoMIPS:                        5000.00
> >> Virtualization:                  VT-x
> >> L1d cache:                       3.5 MiB
> >> L1i cache:                       3.5 MiB
> >> L2 cache:                        112 MiB
> >> L3 cache:                        154 MiB
> >> NUMA node0 CPU(s):               0-27,112-139
> >> NUMA node1 CPU(s):               28-55,140-167
> >> NUMA node2 CPU(s):               56-83,168-195
> >> NUMA node3 CPU(s):               84-111,196-223
> >>
> >> Part of the kernel log is as follows.
> >>
> >> [    1.144402] smp: Brought up 4 nodes, 224 CPUs
> >> [    1.144402] smpboot: Max logical packages: 4
> >> [    1.144402] smpboot: Total of 224 processors activated (1121097.93 BogoMIPS)
> >> [    1.520003] clocksource: timekeeping watchdog on CPU2: Marking clocksource
> >> 'tsc-early' as unstable because the skew is too large:
> >> [    1.520010] clocksource:                       'refined-jiffies' wd_now:
> >> fffb7210 wd_last: fffb7018 mask: ffffffff
> >> [    1.520013] clocksource:                       'tsc-early' cs_now:
> >> 6606717afddd0 cs_last: 66065eff88ad4 mask: ffffffffffffffff
> >> [    1.520015] tsc: Marking TSC unstable due to clocksource watchdog
> >> [    5.164635] node 0 initialised, 98233092 pages in 4013ms
> >> [    5.209294] node 3 initialised, 98923232 pages in 4057ms
> >> [    5.220001] node 2 initialised, 99054870 pages in 4068ms
> >> [    5.222282] node 1 initialised, 99054870 pages in 4070ms
> > 
> > Thanks Xiaofeng for the info.
> > 
> > Could you try the below patch? It is kinda extension of 
> > 
> > b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms") 
> > 
> > which I have run limited test on some 4 sockets Haswell and Cascadelake
> > AP x86 servers.
> > 
> > 
> > Thanks,
> > Feng
> > ---
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cafacb2e58cc..b4ea79cb1d1a 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1217,7 +1217,7 @@ static void __init check_system_tsc_reliable(void)
> >  	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> >  	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> >  	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > -	    nr_online_nodes <= 2)
> > +	    nr_online_nodes <= 8)
> >  		tsc_disable_clocksource_watchdog();
> >  }
> >  
> > 
> Hi Feng,
> 
> I tested this patch on a previous server and it fixes the issue.
 
Thanks for the testing, please do let us know if there is any TSC
problem after long time or stress running.

Plan to send the patch for merging.

Thanks,
Feng

> Thanks,
> Yu
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ