[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZIwMstkB7CG3pDYu@feng-clx>
Date: Fri, 16 Jun 2023 15:18:10 +0800
From: Feng Tang <feng.tang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H . Peter Anvin" <hpa@...or.com>,
David Woodhouse <dwmw@...zon.co.uk>,
"Paul E . McKenney" <paulmck@...nel.org>, <x86@...nel.org>,
<linux-kernel@...r.kernel.org>, <rui.zhang@...el.com>,
<tim.c.chen@...el.com>
Subject: Re: [Patch v2 2/2] x86/tsc: use logical_packages as a better
estimation of socket numbers
On Thu, Jun 15, 2023 at 11:20:21AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 13, 2023 at 01:25:23PM +0800, Feng Tang wrote:
> > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> > on qualified platorms") was introduced to solve problem that
> > sometimes TSC clocksource is wrongly judged as unstable by watchdog
> > like 'jiffies', HPET, etc.
> >
> > In it, the hardware socket number is a key factor for judging whether
> > to disable the watchdog for TSC, and 'nr_online_nodes' was chosen as
> > an estimation due to it is needed in early boot phase before
> > registering 'tsc-early' clocksource, where all none-boot CPUs are not
> > brought up yet.
> >
> > In recent patch review, Dave and Rui pointed out there are many cases
> > in which 'nr_online_nodes' is not accurate, like:
> >
> > * numa emulation (numa=fake=4 etc.)
> > * numa=off
> > * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
> > * SNC (sub-numa cluster) mode enabled
> > * 'nr_cpus=' and 'maxcpus=' kernel cmdline parameter setup
> >
> > Peter Zijlstra suggested 'logical_packages', and it seems to be the
> > best option we have as it covers all the cases above except the
> > last one. But it is only usable after smp_init() and all CPUs are
> > brought up, while 'tsc-early' is initialized before that.
> >
> > One solution is to skip the watchdog for 'tsc-early' clocksource,
> > and move the check after smp_init(), while before 'tsc' clocksource
> > is registered, where 'logical_packages' could be used.
> >
> > For 'nr_cpus' and 'maxcpus' cmdline case, the global flag
> > 'package_count_unreliable' will be set to true and the watchdog
> > will be kept as is.
>
> So I have at least two machines where I boot with 'possible_cpus=#'
> because the BIOS MADT is reporting a stupid number of CPUs that aren't
> actually there.
>
> So I think I'm lucky and side-stepped this nonsense, but if someone were
> to use nr_cpus= for this same purpose, they get screwed over and get the
> watchdog. Sad day for them I suppose.
Thanks for sharing the info! Now I know one reason why those cmdline
parameters were created.
> I realize there might not be a perfect solution,
Yes. Rui is working on a MADT based parsing which may take a while
before being stable, given all kinds of fancy firmware out there.
> but I'm also struggling
> to see the value of the whole package_count_unreliable thing.
'possible_cpus' and 'nr_cpus' parameters are a little different from
"maxcpus' that they limit the possible cpus during boot, and after
boot user has no ways to online other CPUs beyond that limit.
But for 'maxcpus' case, user can online a small number of CPU during
boot and onlined all CPUs later on, which has a possible under
estimation issue for package number, while the above 2 don't have.
So how about only keeping the 'package_count_unreliable' check for
'maxcpus' case?
Thanks,
Feng
Powered by blists - more mailing lists