linux-kernel - Re: [PATCH] e1000e: Link flap workaround option for false IRP events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <36ae9886-8696-4f8a-a1e4-b93a9bd47b2f@lunn.ch>
Date: Wed, 26 Feb 2025 23:52:01 +0100
From: Andrew Lunn <andrew@...n.ch>
To: Mark Pearson <mpearson-lenovo@...ebb.ca>
Cc: anthony.l.nguyen@...el.com, przemyslaw.kitszel@...el.com,
	andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com,
	kuba@...nel.org, pabeni@...hat.com,
	intel-wired-lan@...ts.osuosl.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] e1000e: Link flap workaround option for false IRP events

On Wed, Feb 26, 2025 at 02:44:12PM -0500, Mark Pearson wrote:
> Issue is seen on some Lenovo desktop workstations where there
> is a false IRP event which triggers a link flap.
> Condition is rare and only seen on networks where link speed
> may differ along the path between nodes (e.g 10M/100M)
> 
> Intel are not able to determine root cause but provided a
> workaround that does fix the issue. Tested extensively at Lenovo.
> 
> Adding a module option to enable this workaround for users
> who are impacted by this issue.

Why is a module option needed? Does the workaround itself introduce
issues? Please describe those issues?

In general, module options are not liked. So please include in the
commit message why a module option is the only option.
 
> Signed-off-by: Mark Pearson <mpearson-lenovo@...ebb.ca>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 286155efcedf..06774fb4b2dd 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -37,6 +37,10 @@ static int debug = -1;
>  module_param(debug, int, 0);
>  MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
>  
> +static int false_irp_workaround;
> +module_param(false_irp_workaround, int, 0);
> +MODULE_PARM_DESC(false_irp_workaround, "Enable workaround for rare false IRP event causing link flap");
> +
>  static const struct e1000_info *e1000_info_tbl[] = {
>  	[board_82571]		= &e1000_82571_info,
>  	[board_82572]		= &e1000_82572_info,
> @@ -1757,6 +1761,21 @@ static irqreturn_t e1000_intr_msi(int __always_unused irq, void *data)
>  	/* read ICR disables interrupts using IAM */
>  	if (icr & E1000_ICR_LSC) {
>  		hw->mac.get_link_status = true;
> +
> +		/*
> +		 * False IRP workaround
> +		 * Issue seen on Lenovo P5 and P7 workstations where if there
> +		 * are different link speeds in the network a false IRP event
> +		 * is received, leading to a link flap.
> +		 * Intel unable to determine root cause. This read prevents
> +		 * the issue occurring
> +		 */
> +		if (false_irp_workaround) {
> +			u16 phy_data;
> +
> +			e1e_rphy(hw, PHY_REG(772, 26), &phy_data);

Please add some #define for these magic numbers, so we have some idea
what PHY register you are actually reading. That in itself might help
explain how the workaround actually works.

    Andrew

---
pw-bot: cr