lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Oct 2022 09:09:12 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Dave Hansen <dave.hansen@...el.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H . Peter Anvin" <hpa@...or.com>, <x86@...nel.org>,
        <linux-kernel@...r.kernel.org>, <rui.zhang@...el.com>,
        <tim.c.chen@...el.com>, Xiongfeng Wang <wangxiongfeng2@...wei.com>,
        Yu Liao <liaoyu15@...wei.com>
Subject: Re: [PATCH] x86/tsc: Extend the watchdog check exemption to 4S/8S
 machine

On Mon, Oct 10, 2022 at 07:23:10AM -0700, Dave Hansen wrote:
> On 10/9/22 18:23, Feng Tang wrote:
> >>> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> >>> index cafacb2e58cc..b4ea79cb1d1a 100644
> >>> --- a/arch/x86/kernel/tsc.c
> >>> +++ b/arch/x86/kernel/tsc.c
> >>> @@ -1217,7 +1217,7 @@ static void __init check_system_tsc_reliable(void)
> >>>  	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> >>>  	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> >>>  	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> >>> -	    nr_online_nodes <= 2)
> >>> +	    nr_online_nodes <= 8)
> >> So you're saying all 8 socket systems since Broadwell (?) are TSC
> >> sync'ed ?
> > No, I didn't mean that. I haven't got chance to any 8 sockets
> > machine, and I got a report last month that on one 8S machine,
> > the TSC was judged 'unstable' by HPET as watchdog.
> 
> That's not a great check.  Think about numa=fake=4U, for instance.  Or a
> single-socket system with persistent memory and high bandwidth memory.
> 
> Basically 'nr_online_nodes' is a software construct.  It's going to be
> really hard to infer anything from it about what the _hardware_ is.

You are right! How to get the socket number was indeed a trouble when
I worked on commit b50db7095fe0, the problem is related to the
initialization order. This tsc check needs to be done in tsc_init(),
while the node_stats[] get initialized in later's call of smp_init().

For the case you mentioned above, I dug out some old logs which showed
its init order:

  numa=fake=4 on a SKL desktop
  ================
  [    0.000066] [tsc_early_init()]: nr_online_nodes = 1
  [    0.000068] [tsc_early_init()]: nr_cpu_nodes = 0
  [    0.000070] [tsc_early_init()]: nr_mem_nodes = 0
  [    0.104015] [tsc_init()]: nr_online_nodes = 4
  [    0.104019] [tsc_init()]: nr_cpu_nodes = 0
  [    0.104022] [tsc_init()]: nr_mem_nodes = 4
  [    0.124778] smp: Brought up 4 nodes, 4 CPUs
  [    0.760915] [init_tsc_clocksource()]: nr_online_nodes = 4
  [    0.760919] [init_tsc_clocksource()]: nr_cpu_nodes = 4
  [    0.760922] [init_tsc_clocksource()]: nr_mem_nodes = 4
  
  QEMU with 2 CPU-DRAM nodes + 2 Persistent memory nodes 
  ========================================================
  [    0.066651] [tsc_early_init()]: nr_online_nodes = 1
  [    0.067494] [tsc_early_init()]: nr_cpu_nodes = 0
  [    0.068288] [tsc_early_init()]: nr_mem_nodes = 0
  [    0.677694] [tsc_init()]: nr_online_nodes = 4
  [    0.678862] [tsc_init()]: nr_cpu_nodes = 0
  [    0.679962] [tsc_init()]: nr_mem_nodes = 4
  [    1.139240] [init_tsc_clocksource()]: nr_online_nodes = 4
  [    1.140576] [init_tsc_clocksource()]: nr_cpu_nodes = 2
  [    1.141823] [init_tsc_clocksource()]: nr_mem_nodes = 4
  [    1.660100] [kernel_init()]: nr_online_nodes = 4
  [    1.661234] [kernel_init()]: nr_cpu_nodes = 2
  [    1.662300] [kernel_init()]: nr_mem_nodes = 4

The 'nr_online_nodes' was chosed in the hope of that, in worse case
the patch is just a nop and won't wrongly lift the check.

One possible solution for this problem is to leverage the SRAT table
early init which is called before tsc_init(), and can provide CPU
nodes info. Will try this way.

Thanks,
Feng



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ