[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87lf22eem7.fsf@jogness.linutronix.de>
Date: Fri, 05 Nov 2021 17:50:48 +0106
From: John Ogness <john.ogness@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: Nicholas Piggin <npiggin@...il.com>,
Laurent Dufour <ldufour@...ux.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: Removal of printk safe buffers delays NMI context printk
On 2021-11-05, Petr Mladek <pmladek@...e.com> wrote:
> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>> On 2021-11-05, Nicholas Piggin <npiggin@...il.com> wrote:
>>> but we do need that printk flush capability back there and for
>>> nmi_backtrace.
>>
>> Agreed. I had not considered this necessary side-effect when I
>> removed the NMI safe buffers.
>
> Honestly, I do not understand why it stopped working or how
> it worked before.
IIUC, Nick is presenting a problem where a lockup on the other CPUs is
detected. Those CPUs will dump their backtraces per NMI context. But in
their lockup state the irq_work for those CPUs is not functional. So
even though the messages are in the buffer, there is no one printing the
buffer.
printk_safe_flush() would dump the NMI safe buffers for all the CPUs
into the printk buffer, then trigger an irq_work on itself (the
non-locked-up CPU).
That irq_work trigger was critical, because the other CPUs (which also
triggered irq_works for themselves) aren't able to process irq_works. I
did not consider this case. Which is why we still need to trigger
irq_work here. (Or, as the removed comment hinted at, add some printk()
call to either directly print or trigger the irq_work.)
John Ogness
Powered by blists - more mailing lists