lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <34d028ac-f907-1505-a2fc-f455a10cfa5e@linux.intel.com>
Date: Tue, 16 Sep 2025 13:00:43 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Ciju Rajan K <crajank@...dia.com>, Hans de Goede <hdegoede@...hat.com>, 
    Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>, 
    Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
cc: christophe.jaillet@...adoo.fr, platform-driver-x86@...r.kernel.org, 
    vadimp@...dia.com
Subject: Re: [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug:
 Add support for handling interrupt storm

+ Thomas, lkml

On Tue, 16 Sep 2025, Ciju Rajan K wrote:

> In case of broken hardware, it is possible that broken device will
> flood interrupt handler with false events. For example, if fan or
> power supply has damaged presence pin, it will cause permanent
> generation of plugged in / plugged out events. As a result, interrupt
> handler will consume a lot of CPU resources and will keep raising
> "UDEV" events to the user space.
> 
> This patch provides a mechanism to detect device causing interrupt
> flooding and mask interrupt for this specific device, to isolate
> from interrupt handling flow. Use the following criteria: if the
> specific interrupt was generated 'N' times during 'T' seconds,
> such device is to be considered as broken and will be closed for
> getting interrupts. User will be notified through the log error
> and will be instructed to replace broken device.
> 
> Reviewed-by: Vadim Pasternak <vadimp@...dia.com>
> Signed-off-by: Ciju Rajan K <crajank@...dia.com>

I've a general question on this approach, probably more directed towards 
Hans, Thomas, or others who might have some insight.

Are drivers expected to build their own workarounds for interrupt storms 
due to broken HW such as this? It sounds something that should be at least 
in part handled by something generic, while the lower-most level masking 
and detection might still need to be done by the driver to handle the HW 
specific aspects, there seems to be a generic aspect in all this.

Is there something generic for this already? If not, should there be 
instead of adding this in full into an end-of-the-food-chain driver?


> ---
>  drivers/platform/mellanox/mlxreg-hotplug.c | 35 ++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c b/drivers/platform/mellanox/mlxreg-hotplug.c
> index d246772aafd6..97b84c584d5e 100644
> --- a/drivers/platform/mellanox/mlxreg-hotplug.c
> +++ b/drivers/platform/mellanox/mlxreg-hotplug.c
> @@ -11,6 +11,7 @@
>  #include <linux/hwmon-sysfs.h>
>  #include <linux/i2c.h>
>  #include <linux/interrupt.h>
> +#include <linux/jiffies.h>
>  #include <linux/module.h>
>  #include <linux/platform_data/mlxreg.h>
>  #include <linux/platform_device.h>
> @@ -30,6 +31,11 @@
>  #define MLXREG_HOTPLUG_ATTRS_MAX	128
>  #define MLXREG_HOTPLUG_NOT_ASSERT	3
>  
> +/* Interrupt storm definitions */
> +#define MLXREG_HOTPLUG_WM_COUNTER 100
> +/* Time window in milliseconds */
> +#define MLXREG_HOTPLUG_WM_WINDOW 3000

Define's name should indicate the unit so please postfix it with _MS.

> +
>  /**
>   * struct mlxreg_hotplug_priv_data - platform private data:
>   * @irq: platform device interrupt number;
> @@ -344,7 +350,7 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
>  			   struct mlxreg_core_item *item)
>  {
>  	struct mlxreg_core_data *data;
> -	unsigned long asserted;
> +	unsigned long asserted, wmark_low_ts_window;
>  	u32 regval, bit;
>  	int ret;
>  
> @@ -366,11 +372,34 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
>  	for_each_set_bit(bit, &asserted, 8) {
>  		int pos;
>  
> +		/* Skip already marked storming bit. */
> +		if (item->storming_bits & BIT(bit))
> +			continue;
> +
>  		pos = mlxreg_hotplug_item_label_index_get(item->mask, bit);
>  		if (pos < 0)
>  			goto out;
>  		data = item->data + pos;
> +
> +		/* Interrupt storm handling logic. */
> +		if (data->wmark_low_cntr == 0)
> +			data->wmark_low_ts = jiffies;
> +
> +		if (data->wmark_low_cntr == MLXREG_HOTPLUG_WM_COUNTER - 1) {
> +			data->wmark_high_ts = jiffies;

Why does this timestamp have to be saved?

> +			wmark_low_ts_window = data->wmark_low_ts +
> +					      msecs_to_jiffies(MLXREG_HOTPLUG_WM_WINDOW);

Why not just calculate the ending of the window right at the beginning?
I'd call the member e.g. ->wmark_window (or ->wmark_window_end).

> +			if (time_after(data->wmark_high_ts, wmark_low_ts_window)) {
> +				dev_err(priv->dev, "Storming bit %d (label: %s) - interrupt masked permanently. Replace broken HW.",
> +					bit, data->label);
> +				/* Mark bit as storming. */
> +				item->storming_bits |= BIT(bit);

Why not using continue here for consistency with the other skip above?

If you add the continue, you can remove the else and deindent the zero 
assignment by one level:

> +			} else {
> +				data->wmark_low_cntr = 0;
> +			}
> +		}
> +		data->wmark_low_cntr++;
>  		if (regval & BIT(bit)) {
>  			if (item->inversed)
>  				mlxreg_hotplug_device_destroy(priv, data, item->kind);
> @@ -390,9 +419,9 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
>  	if (ret)
>  		goto out;
>  
> -	/* Unmask event. */
> +	/* Unmask event, exclude storming bits. */
>  	ret = regmap_write(priv->regmap, item->reg + MLXREG_HOTPLUG_MASK_OFF,
> -			   item->mask);
> +			   item->mask & ~item->storming_bits);
>  
>   out:
>  	if (ret)
> 


-- 
 i.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ