lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 6 Feb 2024 11:31:24 -0800
From: Doug Anderson <dianders@...omium.org>
To: John Ogness <john.ogness@...utronix.de>
Cc: Petr Mladek <pmladek@...e.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Li Zhe <lizhe.67@...edance.com>, Pingfan Liu <kernelfans@...il.com>, 
	Lecopzer Chen <lecopzer.chen@...iatek.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

Hi,

On Tue, Feb 6, 2024 at 2:46 AM John Ogness <john.ogness@...utronix.de> wrote:
>
> On 2024-02-06, Petr Mladek <pmladek@...e.com> wrote:
> > I have just got an idea how to make printk_cpu_sync_get_irqsave()
> > less error prone for deadlock on the panic() CPU. The idea is
> > to ignore the lock or give up locking after a timeout on
> > the panic CPU.
>
> This idea is out of scope for this series. But it is something we should
> think about. The issue has always been a possible problem in panic().

One thing to be at least a little cognizant of is how this interacts
with the 10 second timeout in nmi_trigger_cpumask_backtrace(), which
we can hit twice in some of the lockup reports since we first trace
the locked CPU and then the rest. Ideally we don't hit that timeout
lots, except that on arm64 if you don't have pseudo-NMI turned on then
it's actually pretty easy to hit the timeout when you've got a
hard-locked CPU. Probably that 10 second timeout should be
shortened...

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ