linux-kernel - Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZcIGKU8sxti38Kok@alley>
Date: Tue, 6 Feb 2024 11:12:57 +0100
From: Petr Mladek <pmladek@...e.com>
To: Douglas Anderson <dianders@...omium.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Li Zhe <lizhe.67@...edance.com>, Pingfan Liu <kernelfans@...il.com>,
	John Ogness <john.ogness@...utronix.de>,
	Lecopzer Chen <lecopzer.chen@...iatek.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

Hi,

On Wed 2023-12-20 13:15:33, Douglas Anderson wrote:
> 
> When we get multiple lockups at roughly the same time, the output in
> the kernel logs can be very confusing since the reports about the
> lockups end up interleaved in the logs. There is some code in the
> kernel to try to handle this but it wasn't that complete.
> 
> Li Zhe recently made this a bit better for softlockups (specifically
> for the case where `kernel.softlockup_all_cpu_backtrace` is not set)
> in commit 9d02330abd3e ("softlockup: serialized softlockup's log"),
> but that only handled softlockup reports. Hardlockup reports still had
> similar issues.
> 
> This series also has a small fix to avoid dumping all stacks a second
> time in the case of a panic. This is a bit unrelated to the
> interleaving fixes but it does also improve the clarity of lockup
> reports.

Just for record. This patchset has finally got on top of my queue
(after Christmas and a sick leave). And it looks good from my POV.

I was slightly afraid to use printk_cpu_sync_get_irqsave() on more
locations. It has to be used with care to avoid deadlock.

But the patchset looks good. It takes the lock only around code
proceed on the same CPU. And it always releases the lock before
triggering backtrace on another CPU.

Idea:

I have just got an idea how to make printk_cpu_sync_get_irqsave()
less error prone for deadlock on the panic() CPU. The idea is
to ignore the lock or give up locking after a timeout on
the panic CPU.

AFAIK, the lock is currently used only to serialize related
printk() calls. The only risk is that some messages might be
interleaved when it is ignored.

I am not sure if this is a good idea though. It might create
another risk when the lock gets used to serialize more
things in the future and a race might create a real problem.

Best Regards,
Petr