[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171013111444.GB2795@pathway.suse.cz>
Date: Fri, 13 Oct 2017 13:14:44 +0200
From: Petr Mladek <pmladek@...e.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: NMI watchdog dump does not print on hard lockup
On Thu 2017-10-12 12:16:58, Steven Rostedt wrote:
> static void lock_up_cpu(void *data)
> {
> unsigned long flags;
> raw_spin_lock_irqsave(&global_trace.start_lock, flags);
> raw_spin_lock(&global_trace.start_lock);
> raw_spin_unlock(&global_trace.start_lock);
> raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
> }
>
> [..]
>
> on_each_cpu(lock_up_cpu, NULL, 1);
>
> This too triggered the warning. But I noticed that the calling function
> didn't hard lockup. (Not all CPUs were hard locked).
>
> Finally I did:
>
> on_each_cpu(lock_up_cpu, NULL, 0);
> lock_up_cpu(tr);
>
> And boom! It locked up (lockdep was enabled, so I could see it showing
> the deadlock), but then it stopped there. No output. The NMI watchdog
> will only detect hard lockups if there is at least one CPU that is
> still active. This could be an issue on non SMP boxes.
>
> We need a way to have NMI flush to consoles when a lockup is detected,
> and not depend on an irq_work to do so.
I thought that enabling CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE
could help. panic() flushes the printk_save buffers, see
printk_safe_flush_on_panic(). But it somehow does not help.
I need to dig more into it.
In general, we could either improve detection of situations when
the entire system is locked. It would be a reason to risk calling
consoles even in NMI.
Or we could accept that the "default" printk is not good for all
situations and allow more special "debugging" modes:
+ Peter's force_early_printk stuff
+ Allow to disable printk_safe and printk_safe_nmi.
There will be a risk of a deadlock caused by printk.
But there also will be a chance to see the messages.
Best Regards,
Petr
Powered by blists - more mailing lists