lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1308561839-18407-1-git-send-email-santosh.shilimkar@ti.com>
Date:	Mon, 20 Jun 2011 14:53:59 +0530
From:	Santosh Shilimkar <santosh.shilimkar@...com>
To:	<linux-arm-kernel@...ts.infradead.org>
CC:	linux-kernel@...r.kernel.org, linux-omap@...r.kernel.org,
	Santosh Shilimkar <santosh.shilimkar@...com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Russell King <rmk+kernel@....linux.org.uk>
Subject: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

The current ARM CPU hotplug code suffers from couple of race conditions
in CPU online path with scheduler.
The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
active as part of cpu_notify() by the CPU which brought it up before
enabling interrupts.

So we end up in with couple of race conditions,
1) Interrupts are enabled even before CPU is marked as active.
2) Newly plugged CPU is marked as active but it is not marked online yet.

When an interrupt happens before the cpu_active bit is set, the scheduler
can't schedule the woken thread which is bound to that newly onlined cpu
and and selects a fallback runqueue.

Secondly marking CPU active before it is online also not desirable behaviour.

Fix this race conditions.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@...com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Russell King <rmk+kernel@....linux.org.uk>
---
On v3.0 kernel I started seeing lock-up and random crashes when
CPU online/offline was attempted aggressively. With git bisect I could
reach to Peter's commit e4a52bcb9a18142d79e231b6733cabdbf2e67c1f[sched:
Remove rq->lock from the first half of ttwu()] which was also reported
by Marc Zyngier. But even after using the follow up fix from Peter
d6aa8f85f163[sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW],
I was still seeing issues with hotplug path. So as a experiment I just
pushed down the interrupt enabling on newly plugged CPU after it's marked
as online. This made things better and much stable but occasionally I was
still seeing lock-up.

With above as background I looked at arch/x86/ code and got convinced
myself that the experimental hack could be the right fix. While doing
this I came across a commit from Thomas fd8a7de177b [x86: cpu-hotplug:
Prevent softirq wakeup on wrong CPU] which fixed the race 2) on x86
architecture. 

In this patch I have folded possible fixes for both race conditions for
ARM hotplug code as mentioned in change log. Hopefully I am not
introducing any new race with this patch and hence the RFC. 

 arch/arm/kernel/smp.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 344e52b..84373a9 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -302,13 +302,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
 	platform_secondary_init(cpu);
 
 	/*
-	 * Enable local interrupts.
-	 */
-	notify_cpu_starting(cpu);
-	local_irq_enable();
-	local_fiq_enable();
-
-	/*
 	 * Setup the percpu timer for this CPU.
 	 */
 	percpu_timer_setup();
@@ -322,6 +315,17 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
 	 */
 	set_cpu_online(cpu, true);
 
+
+	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
+		cpu_relax();
+
+	/*
+	 * Enable local interrupts.
+	 */
+	notify_cpu_starting(cpu);
+	local_irq_enable();
+	local_fiq_enable();
+
 	/*
 	 * OK, it's off to the idle thread for us
 	 */
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ