linux-kernel - RE: [PATCH 1/2] x86, reboot: revert stop_other

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5C4C569E8A4B9B42A84A977CF070A35B2E4D3186D3@USINDEVS01.corp.hds.com>
Date:	Fri, 27 Apr 2012 12:42:51 -0400
From:	Seiji Aguchi <seiji.aguchi@....com>
To:	Don Zickus <dzickus@...hat.com>, "x86@...nel.org" <x86@...nel.org>,
	"Peter Zijlstra (peterz@...radead.org)" <peterz@...radead.org>
CC:	LKML <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 1/2] x86, reboot: revert stop_other_cpus to using IRQ
 with NMI fallback

Hi,

I tested this patch by applying to -tip tree on 4cpu machine.

In a case where one cpu panics while external interrupt is disable in other cpus, 
WARN_ON in native_smp_send_reschedule() hits.
The problem is the same as follows which happened in reboot case.

https://lkml.org/lkml/2012/4/5/308

I personally think that reasonable solution is just skipping WARN_ON in reboot and panic case
By adding "PANIC" state in system_state.

This issue has not solved since it was initially reported in February.
A band-aid patch is better than doing nothing.

What do you think?

Seiji

> -----Original Message-----
> From: linux-kernel-owner@...r.kernel.org [mailto:linux-kernel-owner@...r.kernel.org] On Behalf Of Don Zickus
> Sent: Thursday, March 29, 2012 4:24 PM
> To: x86@...nel.org
> Cc: LKML; Peter Zijlstra; Don Zickus
> Subject: [PATCH 1/2] x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback
> 
> For v3.3, I added code to use the NMI to stop other cpus in the panic case.  The idea was to make sure all cpus on the system were
> definitely halted to help serialize the panic path to execute the rest of the code on a single cpu.
> 
> The main problem it was trying to solve was how to stop a cpu that was spinning with its irqs disabled.  A IPI irq would be stuck and
> couldn't get in there, but an NMI could.
> 
> Things were great until we had another conversation about some pstore changes.  Because some of the backend pstore still uses
> spinlocks to protect the device access, things could get ugly if a panic happened and we were stuck spinning on a lock.
> 
> Now with the NMI shutting down cpus, we could assume no other cpus were running and just bust the spin lock and proceed.
> 
> The counter argument was, well if you do that the backend could be in a screwed up state and you might not be able to save anything
> as a result.
> If we could have just given the cpu a little more time to finish things, we could have grabbed the spin lock cleanly and everything
> would have been fine.
> 
> Well, how do give a cpu a 'little more time' in the panic case?  For the most part you can't without spinning on the lock and even in that
> case, how long do you spin for?
> 
> So instead of making it ugly in the pstore code, I had the idea that most spin locks are held with irqs disabled, naturally blocking other
> irqs until they are done.  We just need an irq to sit there and grab the cpu once the lock is released and irqs are re-enabled again.
> 
> I decided to modify stop_other_cpus to go back to the REBOOT_IRQ code and use that IPI irq as the blocking irq.  This code has been
> working for a long time and will give up after one second.  To fix the original problem of what happens after one second if a cpu hasn't
> accepted the IPI, I just added the NMI hammer to clobber the cpu.
> 
> The end result of this more complicated-looking-diff-than-it-really-is, is a patch that mostly reverts
> 
> 3603a25 x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus
> 
> and instead just sticks another if-case after the REBOOT_IRQ and checks to see if online_cpus are still > 1, and if so, clobber the rest of
> the cpus with an NMI.
> 
> For the most part, the NMI piece will never get hit, thus behaving just like
> pre-v3.3 code.  However, in those rare conditions, we have a fallback plan.
> 
> I still wrap the NMI check with the knob 'nonmi_ipi' in case someone still has issues with NMIs in the panic path.
> 
> I also reset the original names of the functions.
> 
> Signed-off-by: Don Zickus <dzickus@...hat.com>
> ---
> 
> There was some discussion on this earlier about how it was causing a scheduler WARN with the original NMI implementation of this
> patch.  However, that is due to the way the cpu is being shutdown without communicating that information to the scheduler.
> As a result during shutdown of the cpus the scheduler may reschedule a process onto a cpu that suddenly went away and spit out a
> WARN().
> 
> That is a separate problem then what I am trying to solve here.  I hacked up patch to solve that problem but it doesn't seem to work
> reliably yet so I haven't posted it yet.
> 
>  arch/x86/kernel/smp.c |  100 ++++++++++++++++++++++--------------------------
>  1 files changed, 46 insertions(+), 54 deletions(-)
> 
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 66c74f4..48d2b7d 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -109,6 +109,9 @@
>   *	about nothing of note with C stepping upwards.
>   */
> 
> +static atomic_t stopping_cpu = ATOMIC_INIT(-1); static bool
> +smp_no_nmi_ipi = false;
> +
>  /*
>   * this function sends a 'reschedule' IPI to another CPU.
>   * it goes straight through and wastes no time serializing @@ -149,8 +152,6 @@ void native_send_call_func_ipi(const struct cpumask
> *mask)
>  	free_cpumask_var(allbutself);
>  }
> 
> -static atomic_t stopping_cpu = ATOMIC_INIT(-1);
> -
>  static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)  {
>  	/* We are registered on stopping cpu too, avoid spurious NMI */ @@ -162,7 +163,19 @@ static int
> smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
>  	return NMI_HANDLED;
>  }
> 
> -static void native_nmi_stop_other_cpus(int wait)
> +/*
> + * this function calls the 'stop' function on all other CPUs in the system.
> + */
> +
> +asmlinkage void smp_reboot_interrupt(void) {
> +	ack_APIC_irq();
> +	irq_enter();
> +	stop_this_cpu(NULL);
> +	irq_exit();
> +}
> +
> +static void native_stop_other_cpus(int wait)
>  {
>  	unsigned long flags;
>  	unsigned long timeout;
> @@ -174,20 +187,25 @@ static void native_nmi_stop_other_cpus(int wait)
>  	 * Use an own vector here because smp_call_function
>  	 * does lots of things not suitable in a panic situation.
>  	 */
> +
> +	/*
> +	 * We start by using the REBOOT_VECTOR irq.
> +	 * The irq is treated as a sync point to allow critical
> +	 * regions of code on other cpus to release their spin locks
> +	 * and re-enable irqs.  Jumping straight to an NMI might
> +	 * accidentally cause deadlocks with further shutdown/panic
> +	 * code.  By syncing, we give the cpus up to one second to
> +	 * finish their work before we force them off with the NMI.
> +	 */
>  	if (num_online_cpus() > 1) {
>  		/* did someone beat us here? */
>  		if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1)
>  			return;
> 
> -		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
> -					 NMI_FLAG_FIRST, "smp_stop"))
> -			/* Note: we ignore failures here */
> -			return;
> -
> -		/* sync above data before sending NMI */
> +		/* sync above data before sending IRQ */
>  		wmb();
> 
> -		apic->send_IPI_allbutself(NMI_VECTOR);
> +		apic->send_IPI_allbutself(REBOOT_VECTOR);
> 
>  		/*
>  		 * Don't wait longer than a second if the caller @@ -197,63 +215,37 @@ static void native_nmi_stop_other_cpus(int
> wait)
>  		while (num_online_cpus() > 1 && (wait || timeout--))
>  			udelay(1);
>  	}
> +
> +	/* if the REBOOT_VECTOR didn't work, try with the NMI */
> +	if ((num_online_cpus() > 1) && (!smp_no_nmi_ipi))  {
> +		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
> +					 NMI_FLAG_FIRST, "smp_stop"))
> +			/* Note: we ignore failures here */
> +			/* Hope the REBOOT_IRQ is good enough */
> +			goto finish;
> 
> -	local_irq_save(flags);
> -	disable_local_APIC();
> -	local_irq_restore(flags);
> -}
> -
> -/*
> - * this function calls the 'stop' function on all other CPUs in the system.
> - */
> -
> -asmlinkage void smp_reboot_interrupt(void) -{
> -	ack_APIC_irq();
> -	irq_enter();
> -	stop_this_cpu(NULL);
> -	irq_exit();
> -}
> -
> -static void native_irq_stop_other_cpus(int wait) -{
> -	unsigned long flags;
> -	unsigned long timeout;
> +		/* sync above data before sending IRQ */
> +		wmb();
> 
> -	if (reboot_force)
> -		return;
> +		pr_emerg("Shutting down cpus with NMI\n");
> 
> -	/*
> -	 * Use an own vector here because smp_call_function
> -	 * does lots of things not suitable in a panic situation.
> -	 * On most systems we could also use an NMI here,
> -	 * but there are a few systems around where NMI
> -	 * is problematic so stay with an non NMI for now
> -	 * (this implies we cannot stop CPUs spinning with irq off
> -	 * currently)
> -	 */
> -	if (num_online_cpus() > 1) {
> -		apic->send_IPI_allbutself(REBOOT_VECTOR);
> +		apic->send_IPI_allbutself(NMI_VECTOR);
> 
>  		/*
> -		 * Don't wait longer than a second if the caller
> +		 * Don't wait longer than a 10 ms if the caller
>  		 * didn't ask us to wait.
>  		 */
> -		timeout = USEC_PER_SEC;
> +		timeout = USEC_PER_MSEC * 10;
>  		while (num_online_cpus() > 1 && (wait || timeout--))
>  			udelay(1);
>  	}
> 
> +finish:
>  	local_irq_save(flags);
>  	disable_local_APIC();
>  	local_irq_restore(flags);
>  }
> 
> -static void native_smp_disable_nmi_ipi(void) -{
> -	smp_ops.stop_other_cpus = native_irq_stop_other_cpus;
> -}
> -
>  /*
>   * Reschedule call back.
>   */
> @@ -287,8 +279,8 @@ void smp_call_function_single_interrupt(struct pt_regs *regs)
> 
>  static int __init nonmi_ipi_setup(char *str)  {
> -        native_smp_disable_nmi_ipi();
> -        return 1;
> +	smp_no_nmi_ipi = true;
> +	return 1;
>  }
> 
>  __setup("nonmi_ipi", nonmi_ipi_setup);
> @@ -298,7 +290,7 @@ struct smp_ops smp_ops = {
>  	.smp_prepare_cpus	= native_smp_prepare_cpus,
>  	.smp_cpus_done		= native_smp_cpus_done,
> 
> -	.stop_other_cpus	= native_nmi_stop_other_cpus,
> +	.stop_other_cpus	= native_stop_other_cpus,
>  	.smp_send_reschedule	= native_smp_send_reschedule,
> 
>  	.cpu_up			= native_cpu_up,
> --
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More
> majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/