[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240123121223.22318-1-yaoma@linux.alibaba.com>
Date: Tue, 23 Jan 2024 20:12:20 +0800
From: Bitao Hu <yaoma@...ux.alibaba.com>
To: dianders@...omium.org,
akpm@...ux-foundation.org,
pmladek@...e.com,
tglx@...utronix.de,
maz@...nel.org,
liusong@...ux.alibaba.com
Cc: linux-kernel@...r.kernel.org,
Bitao Hu <yaoma@...ux.alibaba.com>
Subject: [PATCH 0/3] *** Detect interrupt storm in softlockup ***
Hi guys,
I have previously encountered an issue where an NVMe interrupt
storm caused a softlockup, but the call tree did not provide useful
information. This is because the call tree is merely a snapshot and
does not fully reflect the CPU's state over the duration of the
softlockup_thresh period. Consequently, I think that reporting CPU
utilization (system, softirq, hardirq, idle) during a softlockup would
be beneficial for identifying issues related to interrupt storms, as
well as assisting in the analysis of other causes of softlockup.
Furthermore, reporting the most time-consuming hardirqs during a
softlockup could directly pinpoint which interrupt is responsible
for the issue.
Bitao Hu (3):
watchdog/softlockup: low-overhead detection of interrupt storm
watchdog/softlockup: report the most time-consuming hardirq
watchdog/softlockup: add parameter to control the reporting of
time-consuming hardirq
include/linux/irq.h | 9 ++
include/linux/irqdesc.h | 2 +
kernel/irq/irqdesc.c | 9 +-
kernel/watchdog.c | 289 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 308 insertions(+), 1 deletion(-)
--
2.37.1 (Apple Git-137.1)
Powered by blists - more mailing lists