lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211223210343.1116-1-dongli.zhang@oracle.com>
Date:   Thu, 23 Dec 2021 13:03:43 -0800
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     x86@...nel.org
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, peterz@...radead.org,
        rafael.j.wysocki@...el.com, vkuznets@...hat.com,
        alison.schofield@...el.com, tim.c.chen@...ux.intel.com,
        boris.ostrovsky@...cle.com, joe.jin@...cle.com,
        longpeng2@...wei.com, bigeasy@...utronix.de,
        linux-kernel@...r.kernel.org
Subject: [PATCH RESEND 1/1] x86/smpboot: check cpu_initialized_mask first after returning from schedule()

To online a new CPU, the master CPU signals the secondary and waits for at
most 10 seconds until cpu_initialized_mask is set by the secondary CPU.
There is a case that the master CPU fails the online when it takes 10+
seconds for schedule() to return (although the cpu_initialized_mask is
already set by the secondary CPU).

1. The master CPU signals the secondary CPU in do_boot_cpu().

2. As the cpu_initialized_mask is still not set, the master CPU reschedules
via schedule().

3. Suppose it takes 10+ seconds until schedule() returns due to performance
issue. The secondary CPU sets the cpu_initialized_mask during those 10+
seconds.

4. Once the schedule() at the master CPU returns, although the
cpu_initialized_mask is set, the time_before(jiffies, timeout) fails. As a
result, the master CPU regards this as failure.

[   51.983296] smpboot: do_boot_cpu failed(-1) to wakeup CPU#4

5. Since the secondary CPU is stuck at state CPU_UP_PREPARE, any further
online operation will fail by cpu_check_up_prepare(), unless the below
patch set is applied.

https://lore.kernel.org/all/20211206152034.2150770-1-bigeasy@linutronix.de/

This issue is resolved by always first checking whether the secondary CPU
has set cpu_initialized_mask.

Cc: Longpeng(Mike) <longpeng2@...wei.com>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Joe Jin <joe.jin@...cle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
---
To resend by Cc linux-kernel@...r.kernel.org as well.

 arch/x86/kernel/smpboot.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..faad4fcf67eb 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1145,7 +1145,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
 		 */
 		boot_error = -1;
 		timeout = jiffies + 10*HZ;
-		while (time_before(jiffies, timeout)) {
+		while (true) {
 			if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
 				/*
 				 * Tell AP to proceed with initialization
@@ -1154,6 +1154,10 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
 				boot_error = 0;
 				break;
 			}
+
+			if (time_after_eq(jiffies, timeout))
+				break;
+
 			schedule();
 		}
 	}
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ