linux-kernel - Re: Removal of printk safe buffers delays NMI context printk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87lf22eem7.fsf@jogness.linutronix.de>
Date:   Fri, 05 Nov 2021 17:50:48 +0106
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Nicholas Piggin <npiggin@...il.com>,
        Laurent Dufour <ldufour@...ux.ibm.com>,
        linux-kernel@...r.kernel.org
Subject: Re: Removal of printk safe buffers delays NMI context printk

On 2021-11-05, Petr Mladek <pmladek@...e.com> wrote:
> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>> On 2021-11-05, Nicholas Piggin <npiggin@...il.com> wrote:
>>> but we do need that printk flush capability back there and for
>>> nmi_backtrace.
>> 
>> Agreed. I had not considered this necessary side-effect when I
>> removed the NMI safe buffers.
>
> Honestly, I do not understand why it stopped working or how
> it worked before.

IIUC, Nick is presenting a problem where a lockup on the other CPUs is
detected. Those CPUs will dump their backtraces per NMI context. But in
their lockup state the irq_work for those CPUs is not functional. So
even though the messages are in the buffer, there is no one printing the
buffer.

printk_safe_flush() would dump the NMI safe buffers for all the CPUs
into the printk buffer, then trigger an irq_work on itself (the
non-locked-up CPU).

That irq_work trigger was critical, because the other CPUs (which also
triggered irq_works for themselves) aren't able to process irq_works. I
did not consider this case. Which is why we still need to trigger
irq_work here. (Or, as the removed comment hinted at, add some printk()
call to either directly print or trigger the irq_work.)

John Ogness