lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210715095031.41922-3-mark.rutland@arm.com>
Date:   Thu, 15 Jul 2021 10:50:31 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     linux-kernel@...r.kernel.org, tglx@...utronix.de
Cc:     mark.rutland@....com, maz@...nel.org, paulmck@...nel.org,
        peterz@...radead.org
Subject: [PATCH v3 2/2] irq: detect long-running IRQ handlers

If a hard IRQ handler takes a long time to handle an IRQ, it may cause a
soft lockup or RCU stall, but as this will be detected once the handler
has returned it can be difficult to attribute the delay to the specific
IRQ handler.

It's possible to trace IRQ handlers to diagnose this, but that's not a
great fit for automated testing environments (e.g. fuzzers), where
something like the existing lockup/stall detectors works well.

This patch adds a new stall detector for IRQ handlers, which reports
when handlers took longer than a given timeout value (defaulting to 1
second). This won't detect hung IRQ handlers (which requires an NMI, and
should already be caught by hung task detection on systems with NMIs),
but helps on platforms without NMI or where a periodic watchdog is
undesireable.

Signed-off-by: Mark Rutland <mark.rutland@....com>
Acked-by: Paul E. McKenney <paulmck@...nel.org>
Cc: Marc Zyngier <maz@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
---
 kernel/irq/internals.h | 35 ++++++++++++++++++++++++++++++++---
 lib/Kconfig.debug      | 15 +++++++++++++++
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 70a4694cc891..191b6a9d30e2 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -6,6 +6,7 @@
  * kernel/irq/. Do not even think about using any information outside
  * of this file for your non core code.
  */
+#include <linux/bug.h>
 #include <linux/irqdesc.h>
 #include <linux/kernel_stat.h>
 #include <linux/pm_runtime.h>
@@ -122,17 +123,45 @@ static inline irqreturn_t __handle_irqaction(unsigned int irq,
 	return res;
 }
 
+#ifdef CONFIG_DETECT_SLOW_IRQ_HANDLER
+static inline irqreturn_t __handle_check_irqaction(unsigned int irq,
+						   struct irqaction *action,
+						   void *dev_id)
+{
+	u64 timeout = CONFIG_IRQ_HANDLER_TIMEOUT_NS;
+	u64 start, end, duration;
+	int res;
+
+	start = local_clock();
+	res = __handle_irqaction(irq, action, dev_id);
+	end = local_clock();
+
+	duration = end - start;
+	WARN(duration > timeout, "IRQ %d handler %ps took %llu ns\n",
+	     irq, action->handler, duration);
+
+	return res;
+}
+#else
+static inline irqreturn_t __handle_check_irqaction(unsigned int irq,
+						   struct irqaction *action,
+						   void *dev_id)
+{
+	return __handle_irqaction(irq, action, dev_id);
+}
+#endif
+
 static inline irqreturn_t handle_irqaction(unsigned int irq,
 					   struct irqaction *action)
 {
-	return __handle_irqaction(irq, action, action->dev_id);
+	return __handle_check_irqaction(irq, action, action->dev_id);
 }
 
 static inline irqreturn_t handle_irqaction_percpu_devid(unsigned int irq,
 							struct irqaction *action)
 {
-	return __handle_irqaction(irq, action,
-				  raw_cpu_ptr(action->percpu_dev_id));
+	return __handle_check_irqaction(irq, action,
+					raw_cpu_ptr(action->percpu_dev_id));
 }
 
 /* Resending of interrupts :*/
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 831212722924..86003bc0572c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1155,6 +1155,21 @@ config WQ_WATCHDOG
 	  state.  This can be configured through kernel parameter
 	  "workqueue.watchdog_thresh" and its sysfs counterpart.
 
+config DETECT_SLOW_IRQ_HANDLER
+	bool "Detect long-running IRQ handlers"
+	help
+	  Say Y here to enable detection of long-running IRQ handlers. When a
+	  (hard) IRQ handler returns after a given timeout value (1s by
+	  default) a warning will be printed with the name of the handler.
+
+	  This can help to identify specific IRQ handlers which are
+	  contributing to stalls.
+
+config IRQ_HANDLER_TIMEOUT_NS
+	int "Timeout for long-running IRQ handlers (in nanoseconds)"
+	depends on DETECT_SLOW_IRQ_HANDLER
+	default 1000000000
+
 config TEST_LOCKUP
 	tristate "Test module to generate lockups"
 	depends on m
-- 
2.11.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ