[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191210090839.GJ2844@hirez.programming.kicks-ass.net>
Date: Tue, 10 Dec 2019 10:08:39 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Tejun Heo <tj@...nel.org>, jiangshanlai@...il.com,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Workqueues splat due to ending up on wrong CPU
On Mon, Dec 09, 2019 at 10:59:08AM -0800, Paul E. McKenney wrote:
> And it survived! ;-)
>
> Peter, could I please have your Signed-off-by? Or take my Tested-by if
> you would prefer to send it up some other way.
How's this?
---
Subject: cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
From: Peter Zijlstra <peterz@...radead.org>
Date: Tue Dec 10 09:34:54 CET 2019
Paul reported a very sporadic, rcutorture induced, workqueue failure.
When the planets align, the workqueue rescuer's self-migrate fails and
then triggers a WARN for running a work on the wrong CPU.
Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
could be ignored! When stopper->enabled is false, stop_machine will
insta complete the work, without actually doing the work. Worse, it
will not WARN about this (we really should fix this).
It turns out there is a small window where a freshly online'ed CPU is
marked 'online' but doesn't yet have the stopper task running:
BP AP
bringup_cpu()
__cpu_up(cpu, idle) --> start_secondary()
...
cpu_startup_entry()
bringup_wait_for_ap()
wait_for_ap_thread() <-- cpuhp_online_idle()
while (1)
do_idle()
... available to run kthreads ...
stop_machine_unpark()
stopper->enable = true;
Close this by moving the stop_machine_unpark() into
cpuhp_online_idle(), such that the stopper thread is ready before we
start the idle loop and schedule.
Reported-by: "Paul E. McKenney" <paulmck@...nel.org>
Debugged-by: Tejun Heo <tj@...nel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -525,8 +525,7 @@ static int bringup_wait_for_ap(unsigned
if (WARN_ON_ONCE((!cpu_online(cpu))))
return -ECANCELED;
- /* Unpark the stopper thread and the hotplug thread of the target cpu */
- stop_machine_unpark(cpu);
+ /* Unpark the hotplug thread of the target cpu */
kthread_unpark(st->thread);
/*
@@ -1089,8 +1088,8 @@ void notify_cpu_starting(unsigned int cp
/*
* Called from the idle task. Wake up the controlling task which brings the
- * stopper and the hotplug thread of the upcoming CPU up and then delegates
- * the rest of the online bringup to the hotplug thread.
+ * hotplug thread of the upcoming CPU up and then delegates the rest of the
+ * online bringup to the hotplug thread.
*/
void cpuhp_online_idle(enum cpuhp_state state)
{
@@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state
if (state != CPUHP_AP_ONLINE_IDLE)
return;
+ /*
+ * Unpart the stopper thread before we start the idle loop (and start
+ * scheduling); this ensures the stopper task is always available.
+ */
+ stop_machine_unpark(smp_processor_id());
+
st->state = CPUHP_AP_ONLINE_IDLE;
complete_ap_thread(st, true);
}
Powered by blists - more mailing lists