lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 23 May 2014 15:27:49 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	peterz@...radead.org, tglx@...utronix.de, mingo@...nel.org,
	tj@...nel.org, rusty@...tcorp.com.au, akpm@...ux-foundation.org,
	hch@...radead.org, mgorman@...e.de, riel@...hat.com, bp@...e.de,
	rostedt@...dmis.org, mgalbraith@...e.de, ego@...ux.vnet.ibm.com,
	paulmck@...ux.vnet.ibm.com, oleg@...hat.com, rjw@...ysocki.net,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 3/3] CPU hotplug, smp: Flush any pending IPI callbacks
 before CPU offline

On Fri, May 23, 2014 at 03:42:35PM +0530, Srivatsa S. Bhat wrote:
> During CPU offline, in the stop-machine loop, we use 2 separate stages to
> disable interrupts, to ensure that the CPU going offline doesn't get any new
> IPIs from the other CPUs after it has gone offline.
> 
> However, an IPI sent much earlier might arrive late on the target CPU
> (possibly _after_ the CPU has gone offline) due to hardware latencies,
> and if this happens, then the smp-call-function callbacks queued on the
> outgoing CPU will not get noticed (and hence not executed) at all.
> 
> This is somewhat theoretical, but in any case, it makes sense to explicitly
> loop through the call_single_queue and flush any pending callbacks before the
> CPU goes completely offline. So, perform this step in the CPU_DYING stage of
> CPU offline. That way, we would have handled all the queued callbacks before
> going offline, and also, no new IPIs can be sent by the other CPUs to the
> outgoing CPU at that point, because they will all be executing the stop-machine
> code with interrupts disabled.
> 
> But since the outgoing CPU is already marked offline at this point, we can't
> directly invoke generic_smp_call_function_single_interrupt() from CPU_DYING
> notifier, because it will trigger the "IPI to offline CPU" warning. Hence,
> separate out its functionality into to a new function called
> 'flush_smp_call_function_queue' which skips the "is-cpu-online?" check, and
> call this instead (since we know what we are doing in this path).
> 
> (Aside: 'generic_smp_call_function_single_interrupt' is too long a name already,
> so I didn't want to add an uglier-looking double-underscore prefixed version.
> 'flush_smp_call_function_queue' is a much more meaningful name).
> 
> Suggested-by: Frederic Weisbecker <fweisbec@...il.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
> ---
> 
>  kernel/smp.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 306f818..b7a527b 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
>  
> +static void flush_smp_call_function_queue(void);
> +
>  static int
>  hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
> @@ -52,6 +54,18 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  	case CPU_UP_CANCELED:
>  	case CPU_UP_CANCELED_FROZEN:
>  
> +	case CPU_DYING:
> +	case CPU_DYING_FROZEN:
> +		/*
> +		 * The IPIs for the smp-call-function callbacks queued by other
> +		 * CPUs might arrive late due to hardware latencies. So flush
> +		 * out any pending IPI callbacks explicitly (without waiting for
> +		 * the IPIs to arrive), to ensure that the outgoing CPU doesn't
> +		 * go offline with work still pending.
> +		 */
> +		flush_smp_call_function_queue();
> +		break;
> +
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
>  		free_cpumask_var(cfd->cpumask);
> @@ -177,26 +191,56 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
>  	return 0;
>  }
>  
> -/*
> - * Invoked by arch to handle an IPI for call function single. Must be
> - * called from the arch with interrupts disabled.
> +/**
> + * flush_smp_call_function_queue - Flush pending smp-call-function callbacks
> + *
> + * Flush any pending smp-call-function callbacks queued on this CPU. This is
> + * invoked by the generic IPI handler, as well as by a CPU about to go offline,
> + * to ensure that all pending IPI functions are run before it goes completely
> + * offline.
> + *
> + * Loop through the call_single_queue and run all the queued functions.
> + * Must be called with interrupts disabled.
>   */
> -void generic_smp_call_function_single_interrupt(void)
> +static void flush_smp_call_function_queue(void)
>  {
>  	struct llist_node *entry;
>  	struct call_single_data *csd, *csd_next;
> -	static bool warned;
>  
>  	entry = llist_del_all(&__get_cpu_var(call_single_queue));
>  	entry = llist_reverse_order(entry);
>  
> +	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> +		csd->func(csd->info);
> +		csd_unlock(csd);
> +	}
> +}
> +
> +/**
> + * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
> + *
> + * Invoked by arch to handle an IPI for call function single.
> + * Must be called with interrupts disabled.
> + */
> +void generic_smp_call_function_single_interrupt(void)
> +{
> +	static bool warned;
> +
> +	WARN_ON(!irqs_disabled());
> +
>  	/*
>  	 * Shouldn't receive this interrupt on a cpu that is not yet online.
>  	 */
>  	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
> +		struct llist_node *entry;
> +		struct call_single_data *csd;
> +
>  		warned = true;
>  		WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
>  
> +		entry = llist_del_all(&__get_cpu_var(call_single_queue));

This is deleting all the entries, the call to flush_smp_call_function_queue()
will then miss these.

> +		entry = llist_reverse_order(entry);



> +
>  		/*
>  		 * We don't have to use the _safe() variant here
>  		 * because we are not invoking the IPI handlers yet.
> @@ -206,10 +250,7 @@ void generic_smp_call_function_single_interrupt(void)
>  				csd->func);
>  	}
>  
> -	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> -		csd->func(csd->info);
> -		csd_unlock(csd);
> -	}
> +	flush_smp_call_function_queue();
>  }
>  
>  /*
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ