linux-kernel - Re: NMI watchdog dump does not print on hard lockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171013111444.GB2795@pathway.suse.cz>
Date:   Fri, 13 Oct 2017 13:14:44 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: NMI watchdog dump does not print on hard lockup

On Thu 2017-10-12 12:16:58, Steven Rostedt wrote:
> static void lock_up_cpu(void *data)
> {
> 	unsigned long flags;
> 	raw_spin_lock_irqsave(&global_trace.start_lock, flags);
> 	raw_spin_lock(&global_trace.start_lock);
> 	raw_spin_unlock(&global_trace.start_lock);
> 	raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
> }
> 
> [..]
> 
> 	on_each_cpu(lock_up_cpu, NULL, 1);
> 
> This too triggered the warning. But I noticed that the calling function
> didn't hard lockup. (Not all CPUs were hard locked).
> 
> Finally I did:
> 
> 	on_each_cpu(lock_up_cpu, NULL, 0);
> 	lock_up_cpu(tr);
> 
> And boom! It locked up (lockdep was enabled, so I could see it showing
> the deadlock), but then it stopped there. No output. The NMI watchdog
> will only detect hard lockups if there is at least one CPU that is
> still active. This could be an issue on non SMP boxes.
> 
> We need a way to have NMI flush to consoles when a lockup is detected,
> and not depend on an irq_work to do so.


I thought that enabling CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE
could help. panic() flushes the printk_save buffers, see
printk_safe_flush_on_panic(). But it somehow does not help.
I need to dig more into it.

In general, we could either improve detection of situations when
the entire system is locked. It would be a reason to risk calling
consoles even in NMI.

Or we could accept that the "default" printk is not good for all
situations and allow more special "debugging" modes:

	   + Peter's force_early_printk stuff

	   + Allow to disable printk_safe and printk_safe_nmi.
	     There will be a risk of a deadlock caused by printk.
	     But there also will be a chance to see the messages.


Best Regards,
Petr