[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Ymgg6p5TYFOHPbw5@hsj>
Date: Tue, 26 Apr 2022 16:43:45 +0000
From: Huang Shijie <shijie@...amperecomputing.com>
To: Petr Mladek <pmladek@...e.com>
Cc: will@...nel.org, catalin.marinas@....com,
patches@...erecomputing.com, zwang@...erecomputing.com,
darren@...amperecomputing.com, pasha.tatashin@...een.com,
senozhatsky@...omium.org, rostedt@...dmis.org,
john.ogness@...utronix.de, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, Adam Li <adam.li@...erecomputing.com>
Subject: Re: [PATCH v2] arm64: kexec: flush log to console in NMI context
Hi Petr,
On Tue, Apr 26, 2022 at 10:19:02AM +0200, Petr Mladek wrote:
> On Sun 2022-04-24 15:19:52, Huang Shijie wrote:
> > If kdump is configured, nmi_panic() may run to machine_kexec().
> >
> > In NMI context, the defer_console_output() defers the console
> > output by using wake_up_klogd to flush the printk ringbuffer
> > to console.
> >
> > But in the machine_kexec, the system will reset, and there is
> > no chance for the wake_up_klogd to do its job. So we can _not_
> > see any log on the console since the nmi_panic
> > (nmi_panic() will disable the irq).
> >
> > This patch fixes this issue by using console_flush_on_panic()
> > to flush to console.
> >
> > After this patch, we can see all the log since the nmi_panic
> > in the panic console.
>
> This is not a good idea. The crashdump is the best source of
> information about the crashed system. It includes the complete
> log.
okay.
Sometimes, we cannot get the crashdump file, so any log is important
to us.
>
> The system is in unknown state during panic(). Any operation
> might break. Flushing consoles increases the risk that
> the crashdump will not get generated. The crashdump is more
> important. If the crashdump succeeds than the consoles are
> not needed.
>
> Note that printk() does not handle consoles in NMI because it might
> cause deadlock. console_flush_on_panic() tries to avoid deadlock
> caused by console_sem. Also the particular console drivers are
> more careful because oops_in_progress is set at this stage.
> But there is still a risk of the deadlock. There might be another
> locks that are do not check oops_in_progress. Also a potential
> double unlock might cause deadlock.
okay, thanks for the detail explanations.
>
> IMHO, the main motivation for this patch was to flush the per-CPU
> printk buffers (v1). But it is not longer needed. The buffers
> were removed in 5.15-rc1, see the commit 93d102f094be9beab28e
> ("printk: remove safe buffers").
>
> The only reason to call console drivers when crashdump is generated
> is to debug the kexec code path. But I am not sure if
> console_flush_on_panic() would help here. The kexec might fail
> anytime before or after this flush so that the important
> messages will not be visible anyway. John Ogness is going
> to add atomic serial console that might be better for this
> use case.
I hope it is ready as soon as possible..
Thanks
Huang Shijie
Powered by blists - more mailing lists