[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151007084110.GX2881@worktop.programming.kicks-ass.net>
Date: Wed, 7 Oct 2015 10:41:10 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: heiko.carstens@...ibm.com
Cc: linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
Oleg Nesterov <oleg@...hat.com>,
Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>
Subject: [RFC][PATCH] sched: Start stopper early
Hi,
So Heiko reported some 'interesting' fail where stop_two_cpus() got
stuck in multi_cpu_stop() with one cpu waiting for another that never
happens.
It _looks_ like the 'other' cpu isn't running and the current best
theory is that we race on cpu-up and get the stop_two_cpus() call in
before the stopper task is running.
This _is_ possible because we set 'online && active' _before_ we do the
smpboot_unpark thing because of ONLINE notifier order.
The below test patch manually starts the stopper task early.
It boots and hotplugs a cpu on my test box so its not insta broken.
---
kernel/sched/core.c | 7 ++++++-
kernel/stop_machine.c | 5 +++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1764a0f..9a56ef7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void)
rq->age_stamp = sched_clock_cpu(cpu);
}
+extern void cpu_stopper_unpark(unsigned int cpu);
+
static int sched_cpu_active(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+ int cpu = (long)hcpu;
+
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_STARTING:
set_cpu_rq_start_time();
return NOTIFY_OK;
case CPU_ONLINE:
+ cpu_stopper_unpark(cpu);
/*
* At this point a starting CPU has marked itself as online via
* set_cpu_online(). But it might not yet have marked itself
@@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
* Thus, fall-through and help the starting CPU along.
*/
case CPU_DOWN_FAILED:
- set_cpu_active((long)hcpu, true);
+ set_cpu_active(cpu, true);
return NOTIFY_OK;
default:
return NOTIFY_DONE;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 12484e5..c674371 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.selfparking = true,
};
+void cpu_stopper_unpark(unsigned int cpu)
+{
+ kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
+}
+
static int __init cpu_stop_init(void)
{
unsigned int cpu;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists