lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923104452.2407460-3-crajank@nvidia.com>
Date: Tue, 23 Sep 2025 13:44:52 +0300
From: Ciju Rajan K <crajank@...dia.com>
To: <hdegoede@...hat.com>, <ilpo.jarvinen@...ux.intel.com>,
	<tglx@...utronix.de>, <andriy.shevchenko@...ux.intel.com>,
	<linux-kernel@...r.kernel.org>
CC: <christophe.jaillet@...adoo.fr>, <platform-driver-x86@...r.kernel.org>,
	<vadimp@...dia.com>, Ciju Rajan K <crajank@...dia.com>
Subject: [PATCH platform-next v2 2/2] [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm

In case of broken hardware, it is possible that broken device will
flood interrupt handler with false events. For example, if fan or
power supply has damaged presence pin, it will cause permanent
generation of plugged in / plugged out events. As a result, interrupt
handler will consume a lot of CPU resources and will keep raising
"UDEV" events to the user space.

This patch provides a mechanism to detect device causing interrupt
flooding and mask interrupt for this specific device, to isolate
from interrupt handling flow. Use the following criteria: if the
specific interrupt was generated 'N' times during 'T' seconds,
such device is to be considered as broken and will be closed for
getting interrupts. User will be notified through the log error
and will be instructed to replace broken device.

Reviewed-by: Vadim Pasternak <vadimp@...dia.com>
Signed-off-by: Ciju Rajan K <crajank@...dia.com>
---
 drivers/platform/mellanox/mlxreg-hotplug.c | 32 ++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c b/drivers/platform/mellanox/mlxreg-hotplug.c
index d246772aafd6..ae0115ea1fd1 100644
--- a/drivers/platform/mellanox/mlxreg-hotplug.c
+++ b/drivers/platform/mellanox/mlxreg-hotplug.c
@@ -11,6 +11,7 @@
 #include <linux/hwmon-sysfs.h>
 #include <linux/i2c.h>
 #include <linux/interrupt.h>
+#include <linux/jiffies.h>
 #include <linux/module.h>
 #include <linux/platform_data/mlxreg.h>
 #include <linux/platform_device.h>
@@ -30,6 +31,11 @@
 #define MLXREG_HOTPLUG_ATTRS_MAX	128
 #define MLXREG_HOTPLUG_NOT_ASSERT	3
 
+/* Interrupt storm definitions */
+#define MLXREG_HOTPLUG_WM_COUNTER	100
+/* Time window in milliseconds */
+#define MLXREG_HOTPLUG_WM_WINDOW_MS	3000
+
 /**
  * struct mlxreg_hotplug_priv_data - platform private data:
  * @irq: platform device interrupt number;
@@ -366,11 +372,33 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
 	for_each_set_bit(bit, &asserted, 8) {
 		int pos;
 
+		/* Skip already marked storming bit. */
+		if (item->storming_bits & BIT(bit))
+			continue;
+
 		pos = mlxreg_hotplug_item_label_index_get(item->mask, bit);
 		if (pos < 0)
 			goto out;
 
 		data = item->data + pos;
+
+		/* Interrupt storm handling logic. */
+		if (data->wmark_cntr == 0)
+			data->wmark_window = jiffies +
+				msecs_to_jiffies(MLXREG_HOTPLUG_WM_WINDOW_MS);
+
+		if (data->wmark_cntr >= MLXREG_HOTPLUG_WM_COUNTER - 1) {
+			if (time_after(data->wmark_window, jiffies)) {
+				dev_err(priv->dev,
+					"Storming bit %d (label: %s) - interrupt masked permanently. Replace broken HW.",
+					bit, data->label);
+				/* Mark bit as storming. */
+				item->storming_bits |= BIT(bit);
+				continue;
+			}
+			data->wmark_cntr = 0;
+		}
+		data->wmark_cntr++;
 		if (regval & BIT(bit)) {
 			if (item->inversed)
 				mlxreg_hotplug_device_destroy(priv, data, item->kind);
@@ -390,9 +418,9 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
 	if (ret)
 		goto out;
 
-	/* Unmask event. */
+	/* Unmask event, exclude storming bits. */
 	ret = regmap_write(priv->regmap, item->reg + MLXREG_HOTPLUG_MASK_OFF,
-			   item->mask);
+			   item->mask & ~item->storming_bits);
 
  out:
 	if (ret)
-- 
2.47.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ