[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <019e583c-7bcb-c234-200c-fcdb6c49fbb0@oracle.com>
Date: Thu, 31 Jan 2019 11:50:44 +0800
From: Zhenzhong Duan <zhenzhong.duan@...cle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Waiman Long <longman@...hat.com>,
Srinivas Eeda <srinivas.eeda@...cle.com>
Subject: Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention
On 2019/1/30 16:06, Thomas Gleixner wrote:
> On Tue, 22 Jan 2019, Zhenzhong Duan wrote:
>
>> On a large system with many CPUs, using PMTMR as the clock source can
>> have a significant impact on the overall system performance because
>> of the following reasons:
>> 1) There is a single PMTMR counter shared by all the CPUs.
>> 2) PMTMR counter reading is a very slow operation.
>>
>> Using PMTMR as the default clock source may happen when, for example,
>> the TSC clock calibration exceeds the allowable tolerance and HPET
>> disabled by nohpet on kernel command line. Sometimes the performance
>
> The question is why would anyone disable HPET on a larger machine when the
> TSC is wreckaged?
There may be broken hardware where TSC is wreckaged.
On our instances(X8-8/X7-8), TSC isn't wreckaged. Sometimes we are lucky
to pass the bootup stage, then TSC is the final default clocksource. See
log:
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[ 13.963224] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 1911260446275000 ns
[ 19.903175] clocksource: Switched to clocksource refined-jiffies
[ 20.190467] clocksource: acpi_pm: mask: 0xffffff max_cycles:
0xffffff, max_idle_ns: 2085701024 ns
[ 20.201634] clocksource: Switched to clocksource acpi_pm
[ 39.082577] clocksource: tsc: mask: 0xffffffffffffffff max_cycles:
0x2113ba2fe3c, max_idle_ns: 440795266816 ns
[ 39.138781] clocksource: Switched to clocksource tsc
When we are unlucky, logs:
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[ 19.905741] clocksource: Switched to clocksource refined-jiffies
[ 20.181521] clocksource: acpi_pm: mask: 0xffffff max_cycles:
0xffffff, max_idle_ns: 2085701024 ns
[ 44.273786] watchdog: BUG: soft lockup - CPU#48 stuck for 23s!
[swapper/48:0]
[ 44.279992] watchdog: BUG: soft lockup - CPU#49 stuck for 23s!
[migration/49:307]
So we paniced when acpi_pm is initializing and is chosed as default
clocksource temporarily, it paniced just because we add nohpet parameter.
>
> I'm not against the change per se, but I really want to understand why we
> need all the complexity for something which should never be used in a real
> world deployment.
Hmm, it's a strong word of "never be used". Customers may happen to use
nohpet(sanity test?) and report bug to us. Sometimes they does report a
bug that reproduce with their customed config. There may also be BIOS
setting HPET disabled.
Thanks
Zhenzhong
Powered by blists - more mailing lists