netdev - RE: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume back

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <VI1PR04MB68006AF55FA866B1CEBD0D04E6759@VI1PR04MB6800.eurprd04.prod.outlook.com>
Date:   Wed, 7 Apr 2021 01:43:16 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     Heiner Kallweit <hkallweit1@...il.com>,
        "christian.melki@...ata.com" <christian.melki@...ata.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "linux@...linux.org.uk" <linux@...linux.org.uk>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        Russell King - ARM Linux <linux@...linux.org.uk>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        dl-linux-imx <linux-imx@....com>,
        Florian Fainelli <f.fainelli@...il.com>
Subject: RE: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume
 back


Hi Heiner,

> -----Original Message-----
> From: Heiner Kallweit <hkallweit1@...il.com>
> Sent: 2021年4月7日 2:22
> To: Joakim Zhang <qiangqing.zhang@....com>; christian.melki@...ata.com;
> andrew@...n.ch; linux@...linux.org.uk; davem@...emloft.net;
> kuba@...nel.org; Russell King - ARM Linux <linux@...linux.org.uk>
> Cc: netdev@...r.kernel.org; linux-kernel@...r.kernel.org; dl-linux-imx
> <linux-imx@....com>; Florian Fainelli <f.fainelli@...il.com>
> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume
> back
> 
> On 06.04.2021 13:42, Heiner Kallweit wrote:
> > On 06.04.2021 12:07, Joakim Zhang wrote:
> >>
> >>> -----Original Message-----
> >>> From: Heiner Kallweit <hkallweit1@...il.com>
> >>> Sent: 2021年4月6日 14:29
> >>> To: Joakim Zhang <qiangqing.zhang@....com>;
> >>> christian.melki@...ata.com; andrew@...n.ch; linux@...linux.org.uk;
> >>> davem@...emloft.net; kuba@...nel.org
> >>> Cc: netdev@...r.kernel.org; linux-kernel@...r.kernel.org;
> >>> dl-linux-imx <linux-imx@....com>
> >>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO
> >>> bus resume back
> >>>
> >>> On 06.04.2021 04:07, Joakim Zhang wrote:
> >>>>
> >>>> Hi Heiner,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Heiner Kallweit <hkallweit1@...il.com>
> >>>>> Sent: 2021年4月5日 20:10
> >>>>> To: christian.melki@...ata.com; Joakim Zhang
> >>>>> <qiangqing.zhang@....com>; andrew@...n.ch; linux@...linux.org.uk;
> >>>>> davem@...emloft.net; kuba@...nel.org
> >>>>> Cc: netdev@...r.kernel.org; linux-kernel@...r.kernel.org;
> >>>>> dl-linux-imx <linux-imx@....com>
> >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO
> >>>>> bus resume back
> >>>>>
> >>>>> On 05.04.2021 10:43, Christian Melki wrote:
> >>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote:
> >>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote:
> >>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote:
> >>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram
> may
> >>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus
> >>>>>>>>> resume, it will soft reset PHY if PHY driver implements soft_reset
> callback.
> >>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset
> >>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds soft_reset for
> KSZ8081.
> >>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which
> >>>>>>>>> connected to KSZ8081RNB doesn't work any more when system
> >>> resume
> >>>>>>>>> back, MAC
> >>>>> driver is fec_main.c.
> >>>>>>>>>
> >>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus
> >>>>>>>>> resume back would introduce some regression when PHY
> >>>>>>>>> implements soft_reset. When I
> >>>>>>>>
> >>>>>>>> Why is this obvious? Please elaborate on why a soft reset
> >>>>>>>> should break something.
> >>>>>>>>
> >>>>>>>>> am debugging, I found PHY works fine if MAC doesn't support
> >>>>>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called
> >>>>>>>>> during suspend/resume. This let me realize, PHY state machine
> >>>>>>>>> phy_state_machine() could do something breaks the PHY.
> >>>>>>>>>
> >>>>>>>>> As we known, MAC resume first and then MDIO bus resume when
> >>>>>>>>> system resume back from suspend. When MAC resume, usually it
> >>>>>>>>> will invoke
> >>>>>>>>> phy_start() where to change PHY state to PHY_UP, then trigger
> >>>>>>>>> the
> >>>>>>>>> stat> machine to run now. In phy_state_machine(), it will
> >>>>>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK,
> >>>>>>>>> what to next is periodically check PHY link status. When MDIO
> >>>>>>>>> bus resume, it will initialize PHY hardware, including
> >>>>>>>>> soft_reset, what would soft_reset affect seems various from
> different PHYs.
> >>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform a
> >>>>>>>>> soft reset,
> >>>>> it will never complete auto-nego.
> >>>>>>>>
> >>>>>>>> Why? That would need to be checked in detail. Maybe chip errata
> >>>>>>>> documentation provides a hint.
> >>>>>>>>
> >>>>>>>
> >>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN:
> >>>>>>>
> >>>>>>> If software reset (Register 0.15) is used to exit power-down
> >>>>>>> mode (Register 0.11 = 1), two software reset writes (Register
> >>>>>>> 0.15 = 1) are required. The first write clears power-down mode;
> >>>>>>> the second write resets the chip and re-latches the pin strapping pin
> values.
> >>>>>>>
> >>>>>>> Maybe this causes the issue you see and genphy_soft_reset()
> >>>>>>> isn't appropriate for this PHY. Please re-test with the KSZ8081
> >>>>>>> soft reset following the spec comment.
> >>>>>>>
> >>>>>>
> >>>>>> Interesting. Never expected that behavior.
> >>>>>> Thanks for catching it. Skimmed through the datasheets/erratas.
> >>>>>> This is what I found (micrel.c):
> >>>>>>
> >>>>>> 10/100:
> >>>>>> 8001 - Unaffected?
> >>>>>> 8021/8031 - Double reset after PDOWN.
> >>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if
> >>>>>> reset solves the issue since errata says no error after reset but
> >>>>>> is also claiming that only toggling PDOWN (may) or power will help.
> >>>>>> 8051 - Double reset after PDOWN.
> >>>>>> 8061 - Double reset after PDOWN.
> >>>>>> 8081 - Double reset after PDOWN.
> >>>>>> 8091 - Double reset after PDOWN.
> >>>>>>
> >>>>>> 10/100/1000:
> >>>>>> Nothing in gigabit afaics.
> >>>>>>
> >>>>>> Switches:
> >>>>>> 8862 - Not affected?
> >>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists.
> >>>>>> 8864 - Not affected?
> >>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists.
> >>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on
> >>>>>> adjacent links. Do not use.
> >>>>>>
> >>>>>> This certainly explains a lot.
> >>>>>>
> >>>>>>>>>
> >>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume
> >>>>>>>>> back, it should be reasonable after PHY hardware
> >>>>>>>>> re-initialized. Also give state machine a chance to start/config
> auto-nego again.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> If the MAC driver calls phy_stop() on suspend, then
> >>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend()
> >>>>>>>> phydev->returns
> >>>>>>>> false. As a consequence
> >>>>>>>> phydev->suspended_by_mdio_bus is false and
> >>>>>>>> phydev->mdio_bus_phy_resume()
> >>>>>>>> skips the PHY hw initialization.
> >>>>>>>> Please also note that mdio_bus_phy_suspend() calls
> >>>>>>>> phy_stop_machine() that sets the state to PHY_UP.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend.
> >>>>>>> Therefore disregard this part for now.
> >>>>>>>
> >>>>>>>> Having said that the current argumentation isn't convincing.
> >>>>>>>> I'm not aware of such issues on other systems, therefore it's
> >>>>>>>> likely that something is system-dependent.
> >>>>>>>>
> >>>>>>>> Please check the exact call sequence on your system, maybe it
> >>>>>>>> provides a hint.
> >>>>>>>>
> >>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@....com>
> >>>>>>>>> ---
> >>>>>>>>>  drivers/net/phy/phy_device.c | 7 +++++++
> >>>>>>>>>  1 file changed, 7 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/net/phy/phy_device.c
> >>>>>>>>> b/drivers/net/phy/phy_device.c index
> >>>>>>>>> cc38e326405a..312a6f662481
> >>>>>>>>> 100644
> >>>>>>>>> --- a/drivers/net/phy/phy_device.c
> >>>>>>>>> +++ b/drivers/net/phy/phy_device.c
> >>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int
> >>>>> mdio_bus_phy_resume(struct device *dev)
> >>>>>>>>>  	ret = phy_resume(phydev);
> >>>>>>>>>  	if (ret < 0)
> >>>>>>>>>  		return ret;
> >>>>>>>>> +
> >>>>>>>>> +	/* PHY state could be changed to PHY_NOLINK from MAC
> >>>>>>>>> +controller
> >>>>> resume
> >>>>>>>>> +	 * rounte with phy_start(), here change to PHY_UP after
> >>>>> re-initializing
> >>>>>>>>> +	 * PHY hardware, let PHY state machine to start/config
> >>>>>>>>> +auto-nego
> >>>>> again.
> >>>>>>>>> +	 */
> >>>>>>>>> +	phydev->state = PHY_UP;
> >>>>>>>>> +
> >>>>>>>>>  no_resume:
> >>>>>>>>>  	if (phydev->attached_dev && phydev->adjust_link)
> >>>>>>>>>  		phy_start_machine(phydev);
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> This is a quick draft of the modified soft reset for KSZ8081.
> >>>>> Some tests would be appreciated.
> >>>>>
> >>>>
> >>>> I test this patch at my side, unfortunately, it still can't work.
> >>>>
> >>>> I have not found what soft reset would affect in 8081 spec, is
> >>>> there ang common Description for different PHYs?
> >>>>
> >>>
> >>> You can check the clause 22 spec: 802.3 22.2.4.1.1
> >>>
> >>> Apart from that you can check BMCR value and which modes your PHY
> >>> advertises after a soft reset. If the PHY indeed wouldn't restart
> >>> aneg after a soft reset, then it would be the only one with this
> >>> behavior I know. And I'd wonder why there aren't more such reports.
> >>
> >> Hi Heiner,
> >>
> >> We have reasons to believe it is a weird behavior of KSZ8081. I have
> >> another two PHYs at my side, AR8031 and RTL8211FD, they can work fine if
> soft_reset implemented.
> >>
> >> As commit message described, phy_start() would change PHY state to
> PHY_UP finally to PHY_NOLINK when MAC resume.
> >> After MDIO bus resume back, it will periodically check link status. I did below
> code change to dump all PHY registers.
> >>
> >> --- a/drivers/net/phy/phy.c
> >> +++ b/drivers/net/phy/phy.c
> >> @@ -35,7 +35,7 @@
> >>  #include <net/genetlink.h>
> >>  #include <net/sock.h>
> >>
> >> -#define PHY_STATE_TIME HZ
> >> +#define PHY_STATE_TIME (10 * HZ)
> >>
> >>  #define PHY_STATE_STR(_state)                  \
> >>         case PHY_##_state:                      \
> >> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct phy_device
> *phydev)
> >>         if (err)
> >>                 return err;
> >>
> >> +       printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00));
> >> +       printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01));
> >> +       printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02));
> >> +       printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03));
> >> +       printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04));
> >> +       printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05));
> >> +       printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06));
> >> +       printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07));
> >> +       printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08));
> >> +       printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09));
> >> +       printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10));
> >> +       printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11));
> >> +       printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15));
> >> +       printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16));
> >> +       printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17));
> >> +       printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18));
> >> +       printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b));
> >> +       printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c));
> >> +       printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d));
> >> +       printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e));
> >> +       printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f));
> >> +       printk("\n\n");
> >>         if (phydev->link && phydev->state != PHY_RUNNING) {
> >>                 phy_check_downshift(phydev);
> >>                 phydev->state = PHY_RUNNING;
> >>
> >> After MDIO bus resume back, results as below:
> >>
> >> [  119.545383] offset 0x00 data = 1100 [  119.549237] offset 0x01
> >> data = 7849 [  119.563125] offset 0x02 data = 22 [  119.566810]
> >> offset 0x03 data = 1560 [  119.570597] offset 0x04 data = 85e1 [
> >> 119.588016] offset 0x05 data = 0 [  119.592891] offset 0x06 data = 4
> >> [  119.596452] offset 0x07 data = 2001 [  119.600233] offset 0x08
> >> data = 0 [  119.617864] offset 0x09 data = 0 [  119.625990] offset
> >> 0x10 data = 0 [  119.629576] offset 0x11 data = 0 [  119.642735]
> >> offset 0x15 data = 0 [  119.646332] offset 0x16 data = 202 [
> >> 119.650030] offset 0x17 data = 5c02 [  119.668054] offset 0x18 data =
> >> 801 [  119.672997] offset 0x1b data = 0 [  119.676565] offset 0x1c
> >> data = 0 [  119.680084] offset 0x1d data = 0 [  119.698031] offset
> >> 0x1e data = 20 [  119.706246] offset 0x1f data = 8190 [  119.709984]
> >> [  119.709984] [  120.182120] offset 0x00 data = 100 [  120.185894]
> >> offset 0x01 data = 784d [  120.189681] offset 0x02 data = 22 [
> >> 120.206319] offset 0x03 data = 1560 [  120.210171] offset 0x04 data =
> >> 8061 [  120.225353] offset 0x05 data = 0 [  120.228948] offset 0x06
> >> data = 4 [  120.242936] offset 0x07 data = 2001 [  120.246792] offset
> >> 0x08 data = 0 [  120.250313] offset 0x09 data = 0 [  120.266748]
> >> offset 0x10 data = 0 [  120.270335] offset 0x11 data = 0 [
> >> 120.284167] offset 0x15 data = 0 [  120.287760] offset 0x16 data =
> >> 202 [  120.301031] offset 0x17 data = 3c02 [  120.313209] offset 0x18
> >> data = 801 [  120.316983] offset 0x1b data = 0 [  120.320513] offset
> >> 0x1c data = 1 [  120.336589] offset 0x1d data = 0 [  120.340184]
> >> offset 0x1e data = 115 [  120.355357] offset 0x1f data = 8190 [
> >> 120.359112] [  120.359112] [  129.785396] offset 0x00 data = 1100 [
> >> 129.789252] offset 0x01 data = 7849 [  129.809951] offset 0x02 data =
> >> 22 [  129.815018] offset 0x03 data = 1560 [  129.818845] offset 0x04
> >> data = 85e1 [  129.835808] offset 0x05 data = 0 [  129.839398] offset
> >> 0x06 data = 4 [  129.854514] offset 0x07 data = 2001 [  129.858371]
> >> offset 0x08 data = 0 [  129.873357] offset 0x09 data = 0 [
> >> 129.876954] offset 0x10 data = 0 [  129.880472] offset 0x11 data = 0
> >> [  129.896450] offset 0x15 data = 0 [  129.900038] offset 0x16 data =
> >> 202 [  129.915041] offset 0x17 data = 5c02 [  129.918889] offset 0x18
> >> data = 801 [  129.932735] offset 0x1b data = 0 [  129.946830] offset
> >> 0x1c data = 0 [  129.950424] offset 0x1d data = 0 [  129.964585]
> >> offset 0x1e data = 0 [  129.968192] offset 0x1f data = 8190 [
> >> 129.972938] [  129.972938] [  130.425125] offset 0x00 data = 100 [
> >> 130.428889] offset 0x01 data = 784d [  130.442671] offset 0x02 data =
> >> 22 [  130.446360] offset 0x03 data = 1560 [  130.450142] offset 0x04
> >> data = 8061 [  130.467207] offset 0x05 data = 0 [  130.470789] offset
> >> 0x06 data = 4 [  130.485071] offset 0x07 data = 2001 [  130.488934]
> >> offset 0x08 data = 0 [  130.504320] offset 0x09 data = 0 [
> >> 130.507911] offset 0x10 data = 0 [  130.520950] offset 0x11 data = 0
> >> [  130.532865] offset 0x15 data = 0 [  130.536461] offset 0x16 data =
> >> 202 [  130.540156] offset 0x17 data = 3c02 [  130.557218] offset 0x18
> >> data = 801 [  130.560987] offset 0x1b data = 0 [  130.575158] offset
> >> 0x1c data = 1 [  130.578754] offset 0x1d data = 0 [  130.593543]
> >> offset 0x1e data = 115 [  130.597312] offset 0x1f data = 8190
> >>
> >> We can see that BMCR and BMSR changes from 0x1100,0x7849 to
> 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it.
> >> Above process is auto-nego enabled, link is down, auto-nego is disabled, link
> is up. And auto-nego complete bit is always 0.
> >>
> >> PHY seems become unstable if soft reset after start/config auto-nego. I also
> want to fix it in micrel driver, but failed.
> >>
> >
> > Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg
> > may never complete for different reasons, e.g. no physical link. And
> > even if we use a timeout this may add unwanted delays.
> >
> >> Do you have any other insights that can help me further locate the issue?
> Thanks.
> >>
> >
> > I think current MAC/PHY PM handling isn't perfect. Often we have the
> > following
> > scenario:
> >
> > *suspend*
> > 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback
> > (typically involving phy_stop())
> >
> > *resume*
> > 1. MAC resume callback (typically involving phy_start()) 2. PHY is
> > resumed (mdio_bus_phy_resume), incl. calling phy_init_hw()
> >
> > Calling phy_init_hw() after phy_start() doesn't look right.
> > It seems to work in most cases, but there's a certain risk that
> > phy_init_hw() overwrites something, e.g. the advertised modes.
> > I think we have two valid scenarios:
> >
> > 1. phylib PM callbacks are used, then the MAC driver shouldn't
> >    touch the PHY in its PM callbacks, especially not call
> >    phy_stop/phy_start.
> >
> > 2. MAC PM callbacks take care also of the PHY. Then I think we would
> >    need a flag at the phy_device telling it to make the PHY PM
> >    callbacks a no-op.
> >
> > Andrew, Russell: What do you think?
> >
> >> Best Regards,
> >> Joakim Zhang
> >>
> >> [...]
> >>
> > Heiner
> >
> 
> Based on scenario 1 you can also try the following.
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 3db882322..cdf9160fc 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct
> device *dev)
>  	if (netif_running(ndev)) {
>  		if (fep->wol_flag & FEC_WOL_FLAG_ENABLE)
>  			fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON;
> -		phy_stop(ndev->phydev);
>  		napi_disable(&fep->napi);
>  		netif_tx_lock_bh(ndev);
>  		netif_device_detach(ndev);
> @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct
> device *dev)
>  		netif_device_attach(ndev);
>  		netif_tx_unlock_bh(ndev);
>  		napi_enable(&fep->napi);
> -		phy_start(ndev->phydev);
>  	}
>  	rtnl_unlock();

As I described in commit message:

"When I am debugging, I found PHY works fine if MAC doesn't support suspend/resume or phy_stop()/phy_start() doesn't been called during suspend/resume. This let me realize, PHY state machine phy_state_machine() could do something breaks the PHY."

So I have tried your above code change before, and it works. But it is a very common phenomenon that MAC suspend/resume invokes phy_stop/start or phylink_stop/start, and it's been around for a long time.

As the scenarios you list, it is indeed unreasonable to soft reset or config PHY after calling phy_start_aneg() in state machine. PHY state should be PHY_UP after calling phy_init_hw(), It seems more reasonable.

Best Regards,
Joakim Zhang
> --
> 2.31.1
>