[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1612090953570.3794@nanos>
Date: Fri, 9 Dec 2016 10:59:51 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Roland Scheidegger <rscheidegger_lists@...peed.ch>
cc: sroland@...are.com, Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC
On Fri, 9 Dec 2016, Roland Scheidegger wrote:
>
> I saw some system lockups though:
> When doing a cold boot, this kernel never managed to boot up. The last
> message seen is:
> x86: Booting SMP configuration:
> .... node #0, CPUs: #1
Weird. That really would be interesting to figure out what goes wrong
there. What bothers me is that we don't see something like this:
> [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
> -577423766270
Can you please apply the debug patch below and provide the output ?
> [ 0.094492] x86: Booting SMP configuration:
> [ 0.094534] .... node #0, CPUs: #1
> [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
> -577423766270
What on earth is this BIOS doing? That's a couple of minutes back in time.
And the difference between the max and min adjust value is 2050932 cycles.
> Without the patches on cold boot it just was as expected:
> [ 0.093700] x86: Booting SMP configuration:
> [ 0.093737] .... node #0, CPUs: #1
> [ 0.174304] TSC synchronization [CPU#0 -> CPU#1]:
> [ 0.174375] Measured 1837188 cycles TSC warp between CPUs, turning
> off TSC clock.
Not surprising given the above numbers.
Thanks,
tglx
8<-------------------------
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -95,6 +95,8 @@ bool tsc_store_and_check_tsc_adjust(void
cur->nextcheck = jiffies + HZ;
cur->warned = false;
+ pr_info("TSC ADJUST: CPU%u: %lld\n", cpu, bootval);
+
/*
* Check whether this CPU is the first in a package to come up. In
* this case do not check the boot value against another package
@@ -282,17 +284,22 @@ void check_tsc_sync_source(int cpu)
else
atomic_set(&test_runs, 3);
retry:
+ pr_info("TSC source sync %d -> %d runs %d\n", smp_processor_id(),
+ cpu, atomic_read(&test_runs));
/*
* Wait for the target to start or to skip the test:
*/
while (atomic_read(&start_count) != cpus - 1) {
if (atomic_read(&skip_test) > 0) {
+ pr_info("TSC source sync skipped\n");
atomic_set(&skip_test, 0);
return;
}
cpu_relax();
}
+ pr_info("TSC source sync starting\n");
+
/*
* Trigger the target to continue into the measurement too:
*/
@@ -300,6 +307,7 @@ void check_tsc_sync_source(int cpu)
check_tsc_warp(loop_timeout(cpu));
+ pr_info("TSC source sync wait for stop\n");
while (atomic_read(&stop_count) != cpus-1)
cpu_relax();
@@ -311,7 +319,7 @@ void check_tsc_sync_source(int cpu)
if (!nr_warps) {
atomic_set(&test_runs, 0);
- pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+ pr_info("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
smp_processor_id(), cpu);
} else if (atomic_dec_and_test(&test_runs) || random_warps) {
@@ -367,6 +375,7 @@ void check_tsc_sync_target(void)
* successful skip the test.
*/
if (tsc_store_and_check_tsc_adjust()) {
+ pr_info("TSC target sync skip\n");
atomic_inc(&skip_test);
return;
}
@@ -376,6 +385,7 @@ void check_tsc_sync_target(void)
* Register this CPU's participation and wait for the
* source CPU to start the measurement:
*/
+ pr_info("TSC target sync starting\n");
atomic_inc(&start_count);
while (atomic_read(&start_count) != cpus)
cpu_relax();
@@ -392,6 +402,7 @@ void check_tsc_sync_target(void)
*/
atomic_inc(&stop_count);
+ pr_info("TSC target sync stop\n");
/*
* Wait for the source CPU to print stuff:
*/
@@ -403,6 +414,8 @@ void check_tsc_sync_target(void)
*/
atomic_set(&stop_count, 0);
+ pr_info("TSC target sync remaining runs %d\n", atomic_read(&test_runs));
+
/*
* Check the number of remaining test runs. If not zero, the test
* failed and a retry with adjusted TSC is possible. If zero the
Powered by blists - more mailing lists