lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Dec 2016 18:33:22 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Roland Scheidegger <rscheidegger_lists@...peed.ch>
cc:     Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
        x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC

On Fri, 9 Dec 2016, Roland Scheidegger wrote:
> Am 09.12.2016 um 10:59 schrieb Thomas Gleixner:
> > On Fri, 9 Dec 2016, Roland Scheidegger wrote:
> >>
> >> I saw some system lockups though:
> >> When doing a cold boot, this kernel never managed to boot up. The last
> >> message seen is:
> >> x86: Booting SMP configuration:
> >> .... node  #0, CPUs:        #1
> > 
> > Weird. That really would be interesting to figure out what goes wrong
> > there. What bothers me is that we don't see something like this:
> > 
> >> [    0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
> >> -577423766270
> > 
> > Can you please apply the debug patch below and provide the output ?
> Ok, this is the output (minus some typos maybe...):
> 
> x86: Booting SMP configuration:
> .... node #0, CPUs:     #1
> TSC ADJUST: CPU1: -2806491604
> TSC source sync 0 -> 1 runs 3
> TSC ADJUST differs: Reference CPU0: -2805503200 CPU1: -2806491604
> TSC ADJUST synchronize: Reference CPU0: -2805503200 CPU1: -2806491604
> TSC target sync skip
> TSC source sync skipped
> 
> And that's it.

That's a cold boot (power on), right?

That looks like I expected. As we know that the CPUs are in the same
package simply resynchronizing the TSC ADJUST MSR is enough. And the sync
test is skipped as we have resynced TSC ADJUST already to be the same
value.

What's unexpected is that the thing stops working :(

I'm pretty sure the boot cpu hangs in that endless loop waiting for CPU1 to
set the online bit, but for whatever reason this does not happen.

Can you add the patch below to gather more information? There is a hunk in
there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can
turn on as second step.

This BIOS seems to be rather strange ...

Thanks.

	tglx
----
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -95,7 +95,7 @@ bool tsc_store_and_check_tsc_adjust(void
 	cur->nextcheck = jiffies + HZ;
 	cur->warned = false;
 
-	pr_info("TSC ADJUST: CPU%u: %lld\n", cpu,  bootval);
+	pr_info("TSC ADJUST: CPU%u: %lld %lld\n", cpu,  bootval, rdtsc());
 
 	/*
 	 * Check whether this CPU is the first in a package to come up. In
@@ -114,8 +114,15 @@ bool tsc_store_and_check_tsc_adjust(void
 		 * the sync mechanism. If that fails we still can yell
 		 * about boot values not being consistent.
 		 */
-		cur->adjusted = bootval;
 		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
+#if 0
+		if (bootval) {
+			pr_warn("TSC ADJUST: CPU%u: Setting to 0\n", cpu);
+			bootval = 0;
+			wrmsrl(MSR_IA32_TSC_ADJUST, bootval);
+		}
+#endif
+		cur->adjusted = bootval;
 		return false;
 	}
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -245,7 +245,9 @@ static void notrace start_secondary(void
 	 * from seeing a half valid vector space.
 	 */
 	lock_vector_lock();
+	pr_info("Vector locked\n");
 	setup_vector_irq(smp_processor_id());
+	pr_info("Vector setup done\n");
 	set_cpu_online(smp_processor_id(), true);
 	unlock_vector_lock();
 	cpu_set_state_online(smp_processor_id());
@@ -257,8 +259,10 @@ static void notrace start_secondary(void
 	/* to prevent fake stack check failure in clock setup */
 	boot_init_stack_canary();
 
+	pr_info("Clock setup\n");
 	x86_cpuinit.setup_percpu_clockev();
 
+	pr_info("Enter idle startup\n");
 	wmb();
 	cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
 }
@@ -1133,7 +1137,7 @@ int native_cpu_up(unsigned int cpu, stru
 		cpu_relax();
 		touch_nmi_watchdog();
 	}
-
+	pr_info("Target CPU is online\n");
 	return 0;
 }
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ