linux-kernel - Re: [RFC] net: phy: read link status twice when phy_check_link

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a0b26e4b-e288-cf44-049a-7d0b7f5696eb@gmail.com>
Date:   Mon, 29 Jul 2019 22:57:35 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     liuyonglong <liuyonglong@...wei.com>, andrew@...n.ch,
        davem@...emloft.net
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        linuxarm@...wei.com, salil.mehta@...wei.com,
        yisen.zhuang@...wei.com, shiju.jose@...wei.com
Subject: Re: [RFC] net: phy: read link status twice when
 phy_check_link_status()

On 29.07.2019 05:59, liuyonglong wrote:
> 
> 
> On 2019/7/27 2:14, Heiner Kallweit wrote:
>> On 26.07.2019 11:53, Yonglong Liu wrote:
>>> According to the datasheet of Marvell phy and Realtek phy, the
>>> copper link status should read twice, or it may get a fake link
>>> up status, and cause up->down->up at the first time when link up.
>>> This happens more oftem at Realtek phy.
>>>
>> This is not correct, there is no fake link up status.
>> Read the comment in genphy_update_link, only link-down events
>> are latched. Means if the first read returns link up, then there
>> is no need for a second read. And in polling mode we don't do a
>> second read because we want to detect also short link drops.
>>
>> It would be helpful if you could describe your actual problem
>> and whether you use polling or interrupt mode.
>>
> 
> [   44.498633] hns3 0000:bd:00.1 eth5: net open
> [   44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg
> [   44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.

This should not happen. The PHY indicates link up w/o having aneg finished.

> 
> According to the datasheet:
> reg 1.5=0 now, means copper auto-negotiation not complete
> reg 1.2=1 now, means link is up
> 
> We can see that, when we read the link up, the auto-negotiation
> is not complete yet, so the speed is invalid.
> 
> I don't know why this happen, maybe this state is keep from bios?
> Or we may do something else in the phy initialize to fix it?
> And also confuse that why read twice can fix it?
> 
I suppose that basically any delay would do.

> [   44.554063] hns3 0000:bd:00.1: invalid speed (-1)
> [   44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link.
> [   45.194870] hns3 0000:bd:00.1 eth5: link up
> [   45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
> [   46.150051] hns3 0000:bd:00.1 eth5: link down
> [   46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989
> [   47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9
> [   48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
> [   48.934050] hns3 0000:bd:00.1 eth5: link up
> [   49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad
> 
>>> I add a fake status read, and can solve this problem.
>>>
>>> I also see that in genphy_update_link(), had delete the fake
>>> read in polling mode, so I don't know whether my solution is
>>> correct.
>>>

Can you test whether the following fixes the issue for you?
Also it would be interesting which exact PHY models you tested
and whether you built the respective PHY drivers or whether you
rely on the genphy driver. Best use the second patch to get the
needed info. It may make sense anyway to add the call to
phy_attached_info() to the hns3 driver.


diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 6b5cb87f3..fbecfe210 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev)
 
 	linkmode_zero(phydev->lp_advertising);
 
-	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
+	if (phydev->autoneg == AUTONEG_ENABLE &&
+	    (phydev->autoneg_complete || phydev->link)) {
 		if (phydev->is_gigabit_capable) {
 			lpagb = phy_read(phydev, MII_STAT1000);
 			if (lpagb < 0)
-- 
2.22.0


diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index abb1b4385..dc4dfd460 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -231,6 +231,8 @@ int hclge_mac_connect_phy(struct hnae3_handle *handle)
 	linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
 			   phydev->advertising);
 
+	phy_attached_info(phydev);
+
 	return 0;
 }
 
-- 
2.22.0




>>> Or provide a phydev->drv->read_status functions for the phy I
>>> used is more acceptable?
>>>
>>> Signed-off-by: Yonglong Liu <liuyonglong@...wei.com>
>>> ---
>>>  drivers/net/phy/phy.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>> index ef7aa73..0c03edc 100644
>>> --- a/drivers/net/phy/phy.c
>>> +++ b/drivers/net/phy/phy.c
>>> @@ -1,4 +1,7 @@
>>>  // SPDX-License-Identifier: GPL-2.0+
>>> +	err = phy_read_status(phydev);
>>> +	if (err)
>>> +		return err;
>>
>> This seems to be completely wrong at that place.
>>
> 
> Sorry, this can be ignore.
> 
>>>  /* Framework for configuring and reading PHY devices
>>>   * Based on code in sungem_phy.c and gianfar_phy.c
>>>   *
>>> @@ -525,6 +528,11 @@ static int phy_check_link_status(struct phy_device *phydev)
>>>  
>>>  	WARN_ON(!mutex_is_locked(&phydev->lock));
>>>  
>>> +	/* Do a fake read */
>>> +	err = phy_read(phydev, MII_BMSR);
>>> +	if (err < 0)
>>> +		return err;
>>> +
>>>  	err = phy_read_status(phydev);
>>>  	if (err)
>>>  		return err;
>>>
>>
>>
>> .
>>
> 
>