netdev - Re: [Intel-wired-lan] [PATCH] e1000e: fix link fluctuations problem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4bd85100-0f3d-4e38-973c-e6938f304dde@molgen.mpg.de>
Date: Fri, 3 May 2024 07:33:58 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Ricky Wu <en-wei.wu@...onical.com>
Cc: Jesse Brandeburg <jesse.brandeburg@...el.com>, netdev@...r.kernel.org,
 rickywu0421@...il.com, linux-kernel@...r.kernel.org, edumazet@...gle.com,
 intel-wired-lan@...ts.osuosl.org, kuba@...nel.org,
 anthony.l.nguyen@...el.com, pabeni@...hat.com, davem@...emloft.net
Subject: Re: [Intel-wired-lan] [PATCH] e1000e: fix link fluctuations problem

[Fix address jesse.brandeburg@...el.co*m*]


Dear Ricky,


Thank you for your patch.


Am 02.05.24 um 11:12 schrieb Ricky Wu:
> As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642,
> some e1000e NIC reports link up -> link down -> link up when hog-plugging

Do you mean ho*t*-plugging?

> the Ethernet cable.
> 
> The problem is because the unstable behavior of Link Status bit in
> PHY Status Register of some e1000e NIC. When we re-plug the cable,
> the e1000e_phy_has_link_generic() (called after the Link-Status-Changed
> interrupt) has read this bit with 1->0->1 (1=link up, 0=link down)
> and e1000e reports it to net device layer respectively.

Wow. I guess this was “fun” to debug. Could you please document, what 
NICs you saw this, and if it is documented in any datasheet/errata?

> This patch solves the problem by passing polling delays on
> e1000e_phy_has_link_generic() so that it will not get the unstable
> states of Link Status bit.

Does this have any downsides on systems with non-buggy hardware?

> Also, the sleep codes in e1000e_phy_has_link_generic() only take
> effect when error occurs reading the MII register. Moving these codes
> forward to the beginning of the loop so that the polling delays passed
> into this function can take effect on any situation.

Could you please split this hunk into a separate patch?

Should it Fixes: tag be added?

Are there any other  public bug reports and discussions you could reference?

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218642

> Signed-off-by: Ricky Wu <en-wei.wu@...onical.com>
> ---
>   drivers/net/ethernet/intel/e1000e/ich8lan.c |  5 ++++-
>   drivers/net/ethernet/intel/e1000e/phy.c     | 10 ++++++----
>   2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index f9e94be36e97..c462aa6e6dee 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1427,8 +1427,11 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>   	/* First we want to see if the MII Status Register reports
>   	 * link.  If so, then we want to get the current speed/duplex
>   	 * of the PHY.
> +	 * Some PHYs have link fluctuations with the instability of
> +	 * Link Status bit (BMSR_LSTATUS) in MII Status Register.
> +	 * Increase the iteration times and delay solves the problem.

Increas*ing*?

>   	 */
> -	ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link);
> +	ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT, 100000, &link);

Could you please document how 100000 was chosen?

>   	if (ret_val)
>   		goto out;
>   
> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
> index 93544f1cc2a5..ef056363d721 100644
> --- a/drivers/net/ethernet/intel/e1000e/phy.c
> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
> @@ -1776,7 +1776,13 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
>   	u16 i, phy_status;
>   
>   	*success = false;
> +
>   	for (i = 0; i < iterations; i++) {
> +		if (usec_interval >= 1000)
> +			msleep(usec_interval / 1000);
> +		else
> +			udelay(usec_interval);
> +
>   		/* Some PHYs require the MII_BMSR register to be read
>   		 * twice due to the link bit being sticky.  No harm doing
>   		 * it across the board.
> @@ -1799,10 +1805,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
>   			*success = true;
>   			break;
>   		}
> -		if (usec_interval >= 1000)
> -			msleep(usec_interval / 1000);
> -		else
> -			udelay(usec_interval);
>   	}
>   
>   	return ret_val;