linux-kernel - Re: [PATCH] kdump: Fix crash_kexec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1320314844.2989.6.camel@br98xy6r>
Date:	Thu, 03 Nov 2011 11:07:24 +0100
From:	Michael Holzheu <holzheu@...ux.vnet.ibm.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	heiko.carstens@...ibm.com, kexec@...ts.infradead.org,
	linux-kernel@...r.kernel.org,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	schwidefsky@...ibm.com, Vivek Goyal <vgoyal@...hat.com>,
	Don Zickus <dzickus@...hat.com>,
	"Luck, Tony" <tony.luck@...el.com>, linux-arch@...r.kernel.org
Subject: Re: [PATCH] kdump: Fix crash_kexec - smp_send_stop race in panic

Hello Andrew,

On Mon, 2011-10-31 at 03:39 -0700, Andrew Morton wrote:
> On Mon, 31 Oct 2011 10:57:16 +0100 Michael Holzheu <holzheu@...ux.vnet.ibm.com> wrote:
> 
> > > Should this be done earlier in the function?  As it stands we'll have
> > > multiple CPUs scribbling on buf[] at the same time and all trying to
> > > print the same thing at the same time, dumping their stacks, etc. 
> > > Perhaps it would be better to single-thread all that stuff
> > 
> > My fist patch took the spinlock at the beginning of panic(). But then
> > Eric asked, if it wouldn't be better to get both panic printk's and I
> > agreed.
> 
> Hm, why?  It will make a big mess.
> 
> > > Also...  this patch affects all CPU architectures, all configs, etc. 
> > > So we're expecting that every architecture's smp_send_stop() is able to
> > > stop a CPU which is spinning in spin_lock(), possibly with local
> > > interrupts disabled.  Will this work?
> > 
> > At least on s390 it will work. If there are architectures that can't
> > stop disabled CPUs then this problem is already there without this
> > patch.
> > 
> > Example:
> > 
> > 1. 1st CPU gets lock X and panics
> > 2. 2nd CPU is disabled and gets lock X
> 
> (irq-disabled)
> 
> > 3. 1st CPU calls smp_send_stop()
> >    -> 2nd CPU loops disabled and can't be stopped
> 
> Well OK.  Maybe some architectures do have this problem - who would
> notice?  If that is the case, we just made the failure cases much more
> common.

Ok, next idea: What, if the CPUs wait irq-enabled in panic until they
get stopped by smp_send_stop()?

See patch below:
---
From: Michael Holzheu <holzheu@...ux.vnet.ibm.com>
Subject: kdump: fix crash_kexec()/smp_send_stop() race in panic

When two CPUs call panic at the same time there is a possible race
condition that can stop kdump.  The first CPU calls crash_kexec() and the
second CPU calls smp_send_stop() in panic() before crash_kexec() finished
on the first CPU.  So the second CPU stops the first CPU and therefore
kdump fails:

1st CPU:
panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump

2nd CPU:
panic()->crash_kexec()->kexec_mutex already held by 1st CPU
       ->smp_send_stop()-> stop 1st CPU (stop kdump)

This patch fixes the problem by introducing a spinlock in panic that
allows only one CPU to process crash_kexec() and the subsequent panic
code.

Signed-off-by: Michael Holzheu <holzheu@...ux.vnet.ibm.com>
---
 kernel/panic.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink);
  */
 NORET_TYPE void panic(const char * fmt, ...)
 {
+	static DEFINE_SPINLOCK(panic_lock);
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0;
@@ -68,8 +69,16 @@ NORET_TYPE void panic(const char * fmt,
 	 * It's possible to come here directly from a panic-assertion and
 	 * not have preempt disabled. Some functions called from here want
 	 * preempt to be disabled. No point enabling it later though...
+	 *
+	 * Only one CPU is allowed to execute the panic code from here. For
+	 * multiple parallel invocations of panic, all other CPUs will wait
+	 * until they are stopped by the 1st CPU with smp_send_stop().
 	 */
-	preempt_disable();
+	if (!spin_trylock(&panic_lock)) {
+		local_irq_enable();
+		while (1)
+			cpu_relax();
+	}
 
 	console_verbose();
 	bust_spinlocks(1);




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/