linux-kernel - [PATCH 2/2] cpu/hotplug: Unfreeze sibling CPU first on resume from S3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190129102329.27610-3-jan@schnhrr.de>
Date:   Tue, 29 Jan 2019 11:23:29 +0100
From:   Jan H. Schönherr <jan@...nhrr.de>
To:     Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org
Cc:     Jan H. Schönherr <jan@...nhrr.de>,
        Paul Menzel <pmenzel@...gen.mpg.de>,
        Thomas Lendacky <Thomas.Lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: [PATCH 2/2] cpu/hotplug: Unfreeze sibling CPU first on resume from S3

At least one system declares the TSC unstable after resume from S3,
because the TSC is observed going backwards up to roughly 500 cycles
every now and then, when bringing secondary CPUs back online.

The system in question is an AMD Ryzen Threadripper 2950X, microcode
0x800820b, on an ASRock Fatal1ty X399 Professional Gaming, BIOS P3.30.

This unexplained behavior goes away as soon as the sibling CPU of the
boot CPU is brought back up. Hence, add a hack to restore the sibling
CPU before all others on unfreeze. This keeps the TSC stable.

Signed-off-by: Jan H. Schönherr <jan@...nhrr.de>
---
 kernel/cpu.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 91d5c38eb7e5..7097ee8c1b17 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1193,6 +1193,7 @@ EXPORT_SYMBOL_GPL(cpu_up);
 
 #ifdef CONFIG_PM_SLEEP_SMP
 static cpumask_var_t frozen_cpus;
+static int frozen_primary_sibling;
 
 int freeze_secondary_cpus(int primary)
 {
@@ -1211,6 +1212,8 @@ int freeze_secondary_cpus(int primary)
 	for_each_online_cpu(cpu) {
 		if (cpu == primary)
 			continue;
+		if (cpumask_test_cpu(cpu, topology_sibling_cpumask(primary)))
+			frozen_primary_sibling = cpu;
 		trace_suspend_resume(TPS("CPU_OFF"), cpu, true);
 		error = _cpu_down(cpu, 1, CPUHP_OFFLINE);
 		trace_suspend_resume(TPS("CPU_OFF"), cpu, false);
@@ -1246,9 +1249,23 @@ void __weak arch_enable_nonboot_cpus_end(void)
 {
 }
 
+static void enable_nonboot_cpu(int cpu)
+{
+	int error;
+
+	trace_suspend_resume(TPS("CPU_ON"), cpu, true);
+	error = _cpu_up(cpu, 1, CPUHP_ONLINE);
+	trace_suspend_resume(TPS("CPU_ON"), cpu, false);
+	if (!error) {
+		pr_info("CPU%d is up\n", cpu);
+		return;
+	}
+	pr_warn("Error taking CPU%d up: %d\n", cpu, error);
+}
+
 void enable_nonboot_cpus(void)
 {
-	int cpu, error;
+	int cpu;
 
 	/* Allow everyone to use the CPU hotplug again */
 	cpu_maps_update_begin();
@@ -1260,16 +1277,13 @@ void enable_nonboot_cpus(void)
 
 	arch_enable_nonboot_cpus_begin();
 
-	for_each_cpu(cpu, frozen_cpus) {
-		trace_suspend_resume(TPS("CPU_ON"), cpu, true);
-		error = _cpu_up(cpu, 1, CPUHP_ONLINE);
-		trace_suspend_resume(TPS("CPU_ON"), cpu, false);
-		if (!error) {
-			pr_info("CPU%d is up\n", cpu);
-			continue;
-		}
-		pr_warn("Error taking CPU%d up: %d\n", cpu, error);
+	cpu = frozen_primary_sibling;
+	if (cpumask_test_cpu(cpu, frozen_cpus)) {
+		enable_nonboot_cpu(cpu);
+		cpumask_clear_cpu(cpu, frozen_cpus);
 	}
+	for_each_cpu(cpu, frozen_cpus)
+		enable_nonboot_cpu(cpu);
 
 	arch_enable_nonboot_cpus_end();
 
-- 
2.19.2