From: Suresh Siddha MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially happen in parallel with another system wide rendezvous using stop_machine(). This can lead to deadlock (The order in which works are queued can be different on different cpu's. Some cpu's will be running the first rendezvous handler and others will be running the second rendezvous handler. Each set waiting for the other set to join for the system wide rendezvous, leading to a deadlock). MTRR rendezvous sequence is not implemented using stop_machine() as this gets called both from the process context aswell as the cpu online paths (where the cpu has not come online and the interrupts are disabled etc). stop_machine() works with only online cpus. For now, take the stop_machine mutex in the MTRR rendezvous sequence that gets called from an online cpu (here we are in the process context and can potentially sleep while taking the mutex). And the MTRR rendezvous that gets triggered during cpu online doesn't need to take this stop_machine lock (as the stop_machine() already ensures that there is no cpu hotplug going on in parallel by doing get_online_cpus()) TBD: Pursue a cleaner solution of extending the stop_machine() infrastructure to handle the case where the calling cpu is still not online and use this for MTRR rendezvous sequence. fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008 (will be forwarded to stable series for inclusion in kernels v2.6.35-v2.6.39 after some testing in mainline). Reported-by: Vadim Kotelnikov Signed-off-by: Suresh Siddha Cc: stable@kernel.org # 2.6.35+, backport a week or two after this gets more testing in mainline --- arch/x86/kernel/cpu/mtrr/main.c | 16 ++++++++++++++++ include/linux/stop_machine.h | 2 ++ kernel/stop_machine.c | 2 +- 3 files changed, 19 insertions(+), 1 deletion(-) Index: linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/cpu/mtrr/main.c +++ linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c @@ -248,6 +248,18 @@ set_mtrr(unsigned int reg, unsigned long unsigned long flags; int cpu; +#ifdef CONFIG_SMP + /* + * If we are not yet online, then there can be no stop_machine() in + * parallel. Stop machine ensures this by using get_online_cpus(). + * + * If we are online, then we need to prevent a stop_machine() happening + * in parallel by taking the stop cpus mutex. + */ + if (cpu_online(raw_smp_processor_id())) + mutex_lock(&stop_cpus_mutex); +#endif + preempt_disable(); data.smp_reg = reg; @@ -330,6 +342,10 @@ set_mtrr(unsigned int reg, unsigned long local_irq_restore(flags); preempt_enable(); +#ifdef CONFIG_SMP + if (cpu_online(raw_smp_processor_id())) + mutex_unlock(&stop_cpus_mutex); +#endif } /** Index: linux-2.6-tip/include/linux/stop_machine.h =================================================================== --- linux-2.6-tip.orig/include/linux/stop_machine.h +++ linux-2.6-tip/include/linux/stop_machine.h @@ -27,6 +27,8 @@ struct cpu_stop_work { struct cpu_stop_done *done; }; +extern struct mutex stop_cpus_mutex; + int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg); void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg, struct cpu_stop_work *work_buf); Index: linux-2.6-tip/kernel/stop_machine.c =================================================================== --- linux-2.6-tip.orig/kernel/stop_machine.c +++ linux-2.6-tip/kernel/stop_machine.c @@ -132,8 +132,8 @@ void stop_one_cpu_nowait(unsigned int cp cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), work_buf); } +DEFINE_MUTEX(stop_cpus_mutex); /* static data for stop_cpus */ -static DEFINE_MUTEX(stop_cpus_mutex); static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work); int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/