linux-kernel - Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140523132250.GA1768@localhost.localdomain>
Date:	Fri, 23 May 2014 15:22:53 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	peterz@...radead.org, tglx@...utronix.de, mingo@...nel.org,
	tj@...nel.org, rusty@...tcorp.com.au, akpm@...ux-foundation.org,
	hch@...radead.org, mgorman@...e.de, riel@...hat.com, bp@...e.de,
	rostedt@...dmis.org, mgalbraith@...e.de, ego@...ux.vnet.ibm.com,
	paulmck@...ux.vnet.ibm.com, oleg@...hat.com, rjw@...ysocki.net,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that
 leads to "IPI-to-offline-CPU"

On Fri, May 23, 2014 at 03:42:20PM +0530, Srivatsa S. Bhat wrote:
> During CPU offline, stop-machine is used to take control over all the online
> CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU
> that is to be taken offline.
> 
> But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc.
> The important thing to note here is that the _DISABLE_IRQ stage comes much
> later after starting stop-machine, and hence there is a large window where
> other CPUs can send IPIs to the CPU going offline. As a result, we can
> encounter a scenario as depicted below, which causes IPIs to be sent to the
> CPU going offline, and that CPU notices them *after* it has gone offline,
> triggering the "IPI-to-offline-CPU" warning from the smp-call-function code.
> 
> 
>               CPU 1                                         CPU 2
>           (Online CPU)                               (CPU going offline)
> 
>        Enter _PREPARE stage                          Enter _PREPARE stage
> 
>                                                      Enter _DISABLE_IRQ stage
> 
> 
>                                                    =
>        Got a device interrupt,                     | Didn't notice the IPI
>        and the interrupt handler                   | since interrupts were
>        called smp_call_function()                  | disabled on this CPU.
>        and sent an IPI to CPU 2.                   |
>                                                    =
> 
> 
>        Enter _DISABLE_IRQ stage
> 
> 
>        Enter _RUN stage                              Enter _RUN stage
> 
>                                   =
>        Busy loop with interrupts  |                  Invoke take_cpu_down()
>        disabled.                  |                  and take CPU 2 offline
>                                   =
> 
> 
>        Enter _EXIT stage                             Enter _EXIT stage
> 
>        Re-enable interrupts                          Re-enable interrupts
> 
>                                                      The pending IPI is noted
>                                                      immediately, but alas,
>                                                      the CPU is offline at
>                                                      this point.
> 
> 
> 
> So, as we can observe from this scenario, the IPI was sent when CPU 2 was
> still online, and hence it was perfectly legal. But unfortunately it was
> noted only after CPU 2 went offline, resulting in the warning from the
> IPI handling code. In other words, the fault was not at the sender, but
> at the receiver side - and if we look closely, the real bug is in the
> stop-machine sequence itself.
> 
> The problem here is that the CPU going offline disabled its local interrupts
> (by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the
> reason why it was not able to respond to the IPI before going offline.
> 
> A simple solution to this problem is to ensure that the CPU going offline
> disables its interrupts only *after* the other CPUs do the same thing.
> To achieve this, split the _DISABLE_IRQ state into 2 parts:
> 
> 1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
> (i.e., the "other" CPUs) disable their interrupts.
> 
> 2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
> CPU going offline) disables its interrupts.
> 
> With this in place, the CPU going offline will always be the last one to
> disable interrupts. After this step, no further IPIs can be sent to the
> outgoing CPU, since all the other CPUs would be executing the stop-machine
> code with interrupts disabled. And by the time stop-machine ends, the CPU
> would have gone offline and disappeared from the cpu_online_mask, and hence
> future invocations of smp_call_function() and friends will automatically
> prune that CPU out. Thus, we can guarantee that no CPU will end up
> *inadvertently* sending IPIs to an offline CPU.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
> ---
> 
>  kernel/stop_machine.c |   39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 01fbae5..288f7fe 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -130,8 +130,10 @@ enum multi_stop_state {
>  	MULTI_STOP_NONE,
>  	/* Awaiting everyone to be scheduled. */
>  	MULTI_STOP_PREPARE,
> -	/* Disable interrupts. */
> -	MULTI_STOP_DISABLE_IRQ,
> +	/* Disable interrupts on CPUs not in ->active_cpus mask. */
> +	MULTI_STOP_DISABLE_IRQ_INACTIVE,
> +	/* Disable interrupts on CPUs in ->active_cpus mask. */
> +	MULTI_STOP_DISABLE_IRQ_ACTIVE,
>  	/* Run the function */
>  	MULTI_STOP_RUN,
>  	/* Exit */
> @@ -189,12 +191,39 @@ static int multi_cpu_stop(void *data)
>  	do {
>  		/* Chill out and ensure we re-read multi_stop_state. */
>  		cpu_relax();
> +
> +		/*
> +		 * We use 2 separate stages to disable interrupts, namely
> +		 * _INACTIVE and _ACTIVE, to ensure that the inactive CPUs
> +		 * disable their interrupts first, followed by the active CPUs.
> +		 *
> +		 * This is done to avoid a race in the CPU offline path, which
> +		 * can lead to receiving IPIs on the outgoing CPU *after* it
> +		 * has gone offline.
> +		 *
> +		 * During CPU offline, we don't want the other CPUs to send
> +		 * IPIs to the active_cpu (the outgoing CPU) *after* it has
> +		 * disabled interrupts (because, then it will notice the IPIs
> +		 * only after it has gone offline). We can prevent this by
> +		 * making the other CPUs disable their interrupts first - that
> +		 * way, they will run the stop-machine code with interrupts
> +		 * disabled, and hence won't send IPIs after that point.
> +		 */
> +
>  		if (msdata->state != curstate) {
>  			curstate = msdata->state;
>  			switch (curstate) {
> -			case MULTI_STOP_DISABLE_IRQ:
> -				local_irq_disable();
> -				hard_irq_disable();
> +			case MULTI_STOP_DISABLE_IRQ_INACTIVE:
> +				if (!is_active) {
> +					local_irq_disable();
> +					hard_irq_disable();
> +				}
> +				break;
> +			case MULTI_STOP_DISABLE_IRQ_ACTIVE:
> +				if (is_active) {
> +					local_irq_disable();
> +					hard_irq_disable();
> +				}

Do we actually need that now that we are flushing the ipi queue on CPU dying?

>  				break;
>  			case MULTI_STOP_RUN:
>  				if (is_active)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/