linux-kernel - Re: [PATCH] printk/nmi: restore printk_func in nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160229103141.GL3305@pathway.suse.cz>
Date:	Mon, 29 Feb 2016 11:31:41 +0100
From:	Petr Mladek <pmladek@...e.com>
To:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] printk/nmi: restore printk_func in nmi_panic

On Sat 2016-02-27 11:19:44, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (02/26/16 15:57), Petr Mladek wrote:
> > On Fri 2016-02-26 12:37:20, Sergey Senozhatsky wrote:
> > > When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
> > > must be restored via printk_nmi_exit() call, so panic() will be able
> > > to flush nmi buf and show backtrace and panic message. We also better
> > > explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
> > > because it may be too late to rely on irq work.
> > > 
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> > > ---
> > >  include/linux/kernel.h | 6 ++++--
> > >  kernel/printk/printk.c | 1 +
> > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > > index f4fa2b2..3ee33d5 100644
> > > --- a/include/linux/kernel.h
> > > +++ b/include/linux/kernel.h
> > > @@ -469,10 +469,12 @@ do {									\
> > >  	cpu = raw_smp_processor_id();					\
> > >  	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
> > >  									\
> > > -	if (old_cpu == PANIC_CPU_INVALID)				\
> > > +	if (old_cpu == PANIC_CPU_INVALID) {				\
> > > +		printk_nmi_exit();					\
> > 
> > This might end up in a deadlock that printk_nmi() wanted to avoid.
> 
> aha, I see.
> 
> > I think about a compromise. We should try to get the messages
> > out only when kdump is not enabled.
> 
> can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?

That is the problem. zap_locks() is not a solution.

First, it handles only lockbuf_lock and console_sem. There are other
locks used by particular consoles that might cause a deadlock.

Second, re-initializing locks is dangerous of its own. If they are
released by some other CPU that is still running, you might end up
in a deadlock because of a double release. In fact, I think that it
actually increases the risk. If there are more than 2 CPUs than
it is more likely that a printk is running on another CPU than
on the current one.

Peter Zijlstra had an idea of using early console in this case.
I am not sure but I guess that it does not have any internal locks.
But there is still the other problem with the double release.

I am afraid that the only solution is to make it configurable.
Some people might want to risk the deadlock and try to see the messages
on console. Others might rather want to get the crashdump for sure
with the cost that they will need to extract the NMI messages
from the per-CPU buffers.

Best Regards,
Petr