lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 29 Feb 2016 11:31:41 +0100
From:	Petr Mladek <pmladek@...e.com>
To:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] printk/nmi: restore printk_func in nmi_panic

On Sat 2016-02-27 11:19:44, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (02/26/16 15:57), Petr Mladek wrote:
> > On Fri 2016-02-26 12:37:20, Sergey Senozhatsky wrote:
> > > When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
> > > must be restored via printk_nmi_exit() call, so panic() will be able
> > > to flush nmi buf and show backtrace and panic message. We also better
> > > explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
> > > because it may be too late to rely on irq work.
> > > 
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> > > ---
> > >  include/linux/kernel.h | 6 ++++--
> > >  kernel/printk/printk.c | 1 +
> > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > > index f4fa2b2..3ee33d5 100644
> > > --- a/include/linux/kernel.h
> > > +++ b/include/linux/kernel.h
> > > @@ -469,10 +469,12 @@ do {									\
> > >  	cpu = raw_smp_processor_id();					\
> > >  	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
> > >  									\
> > > -	if (old_cpu == PANIC_CPU_INVALID)				\
> > > +	if (old_cpu == PANIC_CPU_INVALID) {				\
> > > +		printk_nmi_exit();					\
> > 
> > This might end up in a deadlock that printk_nmi() wanted to avoid.
> 
> aha, I see.
> 
> > I think about a compromise. We should try to get the messages
> > out only when kdump is not enabled.
> 
> can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?

That is the problem. zap_locks() is not a solution.

First, it handles only lockbuf_lock and console_sem. There are other
locks used by particular consoles that might cause a deadlock.

Second, re-initializing locks is dangerous of its own. If they are
released by some other CPU that is still running, you might end up
in a deadlock because of a double release. In fact, I think that it
actually increases the risk. If there are more than 2 CPUs than
it is more likely that a printk is running on another CPU than
on the current one.


Peter Zijlstra had an idea of using early console in this case.
I am not sure but I guess that it does not have any internal locks.
But there is still the other problem with the double release.

I am afraid that the only solution is to make it configurable.
Some people might want to risk the deadlock and try to see the messages
on console. Others might rather want to get the crashdump for sure
with the cost that they will need to extract the NMI messages
from the per-CPU buffers.


Best Regards,
Petr

Powered by blists - more mailing lists