lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 19 May 2014 17:49:09 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Tejun Heo <tj@...nel.org>, akpm@...ux-foundation.org,
	fweisbec@...il.com
CC:	paulmck@...ux.vnet.ibm.com, peterz@...radead.org,
	tglx@...utronix.de, mingo@...nel.org, rusty@...tcorp.com.au,
	hch@...radead.org, mgorman@...e.de, riel@...hat.com, bp@...e.de,
	rostedt@...dmis.org, mgalbraith@...e.de, ego@...ux.vnet.ibm.com,
	oleg@...hat.com, rjw@...ysocki.net, linux-kernel@...r.kernel.org
Subject: [PATCH v5 UPDATEDv2 3/3] CPU hotplug, smp: Flush any pending IPI
 callbacks before CPU offline

On 05/16/2014 01:13 AM, Srivatsa S. Bhat wrote:
> -----------------------------------------------------------------------
> 
> From: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
> [PATCH v5 UPDATED 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
> 
[...]
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 288f7fe..6b3a2f3 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -21,6 +21,7 @@
>  #include <linux/smpboot.h>
>  #include <linux/atomic.h>
>  #include <linux/lglock.h>
> +#include <linux/smp.h>
>  
>  /*
>   * Structure to determine completion condition and record errors.  May
> @@ -224,6 +225,16 @@ static int multi_cpu_stop(void *data)
>  					local_irq_disable();
>  					hard_irq_disable();
>  				}
> +
> +				/*
> +				 * IPIs (from the inactive CPUs) might arrive
> +				 * late due to hardware latencies. So flush
> +				 * out any pending IPI callbacks explicitly,
> +				 * to ensure that the outgoing CPU doesn't go
> +				 * offline with work still pending (during
> +				 * CPU hotplug).
> +				 */
> +				flush_smp_call_function_queue();
>  				break;

I just realized that since flush_smp_call_function_queue() is called outside
of the "if (is_active)" block, it will get executed by all CPUs (even the
inactive ones) unnecessarily. We need only the active-cpu to run it; the others
don't have to.

I must have been carried away by the name of the state (which has _ACTIVE in it)
and thought that only the active CPUs will run it. So, here is an updated patch.

-----------------------------------------------------------------------------

From: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
[PATCH v5 UPDATEDv2 3/3] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline

During CPU offline, in the stop-machine loop, we use 2 separate stages to
disable interrupts, to ensure that the CPU going offline doesn't get any new
IPIs from the other CPUs after it has gone offline.

However, an IPI sent much earlier might arrive late on the target CPU
(possibly _after_ the CPU has gone offline) due to hardware latencies,
and due to this, the smp-call-function callbacks queued on the outgoing
CPU might not get noticed (and hence not executed) at all.

This is somewhat theoretical, but in any case, it makes sense to explicitly
loop through the call_single_queue and flush any pending callbacks before the
CPU goes completely offline. So, flush the queued smp-call-function callbacks
in the MULTI_STOP_DISABLE_IRQ_ACTIVE stage, after disabling interrupts on the
active CPU. That way, we would have handled all the queued callbacks before
going offline, and also, no new IPIs can be sent by the other CPUs to the
outgoing CPU at that point, because they will all be executing the stop-machine
code with interrupts disabled.

Suggested-by: Frederic Weisbecker <fweisbec@...il.com>
Reviewed-by: Tejun Heo <tj@...nel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
---

 include/linux/smp.h   |    2 ++
 kernel/smp.c          |   33 +++++++++++++++++++++++++++++++++
 kernel/stop_machine.c |   11 +++++++++++
 3 files changed, 46 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 633f5ed..7924191 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -52,6 +52,8 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
 
 int smp_call_function_single_async(int cpu, struct call_single_data *csd);
 
+void flush_smp_call_function_queue(void);
+
 #ifdef CONFIG_SMP
 
 #include <linux/preempt.h>
diff --git a/kernel/smp.c b/kernel/smp.c
index 306f818..f361681 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -212,6 +212,39 @@ void generic_smp_call_function_single_interrupt(void)
 	}
 }
 
+/**
+ * flush_smp_call_function_queue - Flush pending smp-call-function callbacks
+ *
+ * Flush any pending smp-call-function callbacks queued on this CPU (including
+ * those for which the source CPU's IPIs might not have been received on this
+ * CPU yet). This is invoked by a CPU about to go offline, to ensure that all
+ * pending IPI functions are run before it goes completely offline.
+ *
+ * Loop through the call_single_queue and run all the queued functions.
+ * Must be called with interrupts disabled.
+ */
+void flush_smp_call_function_queue(void)
+{
+	struct llist_head *head;
+	struct llist_node *entry;
+	struct call_single_data *csd, *csd_next;
+
+	WARN_ON(!irqs_disabled());
+
+	head = &__get_cpu_var(call_single_queue);
+
+	if (likely(llist_empty(head)))
+		return;
+
+	entry = llist_del_all(head);
+	entry = llist_reverse_order(entry);
+
+	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+		csd->func(csd->info);
+		csd_unlock(csd);
+	}
+}
+
 /*
  * smp_call_function_single - Run a function on a specific CPU
  * @func: The function to run. This must be fast and non-blocking.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 288f7fe..8581d20 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include <linux/smpboot.h>
 #include <linux/atomic.h>
 #include <linux/lglock.h>
+#include <linux/smp.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -223,6 +224,16 @@ static int multi_cpu_stop(void *data)
 				if (is_active) {
 					local_irq_disable();
 					hard_irq_disable();
+
+					/*
+					 * IPIs (from the inactive CPUs) might
+					 * arrive late due to hardware latencies.
+					 * So flush out any pending IPI callbacks
+					 * explicitly, to ensure that the outgoing
+					 * CPU doesn't go offline with work still
+					 * pending (during CPU hotplug).
+					 */
+					flush_smp_call_function_queue();
 				}
 				break;
 			case MULTI_STOP_RUN:


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ