lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76a6cf81-fdf6-4583-aa4d-1e6a899bbb58@linux.alibaba.com>
Date: Thu, 25 Jan 2024 17:19:18 +0800
From: Bitao Hu <yaoma@...ux.alibaba.com>
To: Doug Anderson <dianders@...omium.org>
Cc: akpm@...ux-foundation.org, pmladek@...e.com, tglx@...utronix.de,
 maz@...nel.org, liusong@...ux.alibaba.com, linux-kernel@...r.kernel.org,
 yaoma@...ux.alibaba.com
Subject: Re: [PATCH 2/3] watchdog/softlockup: report the most time-consuming
 hardirq



On 2024/1/25 08:19, Doug Anderson wrote:
> 
> I guess maybe (?) the case it wouldn't catch so well would be if you
> had a loop that looked like:
> 
> * CPU from "storming" device takes ~10 ms and finishes.
> * After the "storming" device finishes, we somehow manage to service a
> whole pile of normal interrupts from non-storming devices.
> * After the non-storming interrupts finished, then we went back to
> servicing the slow storming interrupt.
> 
> I haven't dug into the bowels of the Linux IRQ handling well enough to
> know if that would be possible. However, even if it was possible it
> doesn't feel likely to happen. If those "normal" interrupts aren't
> storming then it seems unlikely we'd see more than one of each of them
> between the "storming" interrupts. Thus it feels like the "storming"
> interrupt would still have the highest (or tied for the highest) count
> on the locked up CPU. Maybe you could print the top 4 or 5 (instead of
> the top 3) and you'd be pretty certain to catch even this case?
> 
Hi, I now understand your suggestion. Using interrupt time as a
criterion of interrupt storm is suitable for scenarios where we don't
know whether an interrupt storm has happened. However, in the context of
this softlockup, we have in fact already determined that an interrupt
storm might have occurred through the "hardirq time", so there's no need
for a redundant determination. Thus, I agree that using IRQ counts
seems like good enough to find "storming" interrupts.

> In any case, I'll wait before doing a more thorough review for now and
> hope the above sounds right to you.
> 
> -Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ