[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce36eb26-a304-9dd8-3bee-4117929a5546@gmail.com>
Date: Wed, 1 Sep 2021 17:40:20 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Joakim Zhang <qiangqing.zhang@....com>,
Russell King <linux@...linux.org.uk>
Cc: "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
"alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
"joabreu@...opsys.com" <joabreu@...opsys.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>,
"mcoquelin.stm32@...il.com" <mcoquelin.stm32@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"andrew@...n.ch" <andrew@...n.ch>,
"f.fainelli@...il.com" <f.fainelli@...il.com>,
dl-linux-imx <linux-imx@....com>
Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume back
with WoL enabled
On 01.09.2021 12:21, Joakim Zhang wrote:
>
> Hi Russell,
>
>> -----Original Message-----
>> From: Russell King <linux@...linux.org.uk>
>> Sent: 2021年9月1日 17:14
>> To: Joakim Zhang <qiangqing.zhang@....com>
>> Cc: peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
>> joabreu@...opsys.com; davem@...emloft.net; kuba@...nel.org;
>> mcoquelin.stm32@...il.com; netdev@...r.kernel.org; andrew@...n.ch;
>> f.fainelli@...il.com; hkallweit1@...il.com; dl-linux-imx <linux-imx@....com>
>> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume
>> back with WoL enabled
>>
>> On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:
>>> We can reproduce this issue with below steps:
>>> 1) enable WoL on the host
>>> 2) host system suspended
>>> 3) remote client send out wakeup packets We can see that host system
>>> resume back, but can't work, such as ping failed.
>>>
>>> After a bit digging, this issue is introduced by the commit
>>> 46f69ded988d
>>> ("net: stmmac: Use resolved link config in mac_link_up()"), which use
>>> the finalised link parameters in mac_link_up() rather than the
>>> parameters in mac_config().
>>>
>>> There are two scenarios for MAC suspend/resume:
>>>
>>> 1) MAC suspend with WoL disabled, stmmac_suspend() call
>>> phylink_mac_change() to notify phylink machine that a change in MAC
>>> state, then .mac_link_down callback would be invoked. Further, it will
>>> call phylink_stop() to stop the phylink instance. When MAC resume
>>> back, firstly phylink_start() is called to start the phylink instance,
>>> then call phylink_mac_change() which will finally trigger phylink
>>> machine to invoke .mac_config and .mac_link_up callback. All is fine
>>> since configuration in these two callbacks will be initialized.
>>>
>>> 2) MAC suspend with WoL enabled, phylink_mac_change() will put link
>>> down, but there is no phylink_stop() to stop the phylink instance, so
>>> it will link up again, that means .mac_config and .mac_link_up would
>>> be invoked before system suspended. After system resume back, it will
>>> do DMA initialization and SW reset which let MAC lost the hardware
>>> setting (i.e MAC_Configuration register(offset 0x0) is reset). Since
>>> link is up before system suspended, so .mac_link_up would not be
>>> invoked after system resume back, lead to there is no chance to
>>> initialize the configuration in .mac_link_up callback, as a result,
>>> MAC can't work any longer.
>>>
>>> Above description is what I found when debug this issue, this patch is
>>> just revert broken patch to workaround it, at least make MAC work when
>>> system resume back with WoL enabled.
>>>
>>> Said this is a workaround, since it has not resolve the issue completely.
>>> I just move the speed/duplex/pause etc into .mac_config callback,
>>> there are other configurations in .mac_link_up callback which also
>>> need to be initialized to work for specific functions.
>>
>> NAK. Please read the phylink documentation. speed/duplex/pause is undefined
>> in .mac_config.
>
> Speed/duplex/pause also the field of " struct phylink_link_state", so these can be refered in .mac_config, please
> see the link which stmmac did before:
> https://elixir.bootlin.com/linux/v5.4.143/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L852
>
>
>> I think the problem here is that you're not calling phylink_stop() when WoL is
>> enabled, which means phylink will continue to maintain the state as per the
>> hardware state, and phylib will continue to run its state machine reporting the
>> link state to phylink.
>
> Yes, I also tried do below code change, but the host would not be wakeup, phylink_stop() would
> call phy_stop(), phylib would call phy_suspend() finally, it will not suspend phy if it detect WoL enabled,
> so now I don't know why system can't be wakeup with this code change.
>
Follow-up question would be whether link breaks accidentally on suspend, or whether
something fails on resume.When suspending, does the link break and link LEDs go off?
Depending on LED configuration you may also see whether link speed is reduced
on suspend.
struct net_device has a member wol_enabled, does it make a difference if set it?
> @@ -5374,7 +5374,6 @@ int stmmac_suspend(struct device *dev)
> rtnl_lock();
> if (device_may_wakeup(priv->device))
> phylink_speed_down(priv->phylink, false);
> - phylink_stop(priv->phylink);
> rtnl_unlock();
> mutex_lock(&priv->lock);
>
> @@ -5385,6 +5384,10 @@ int stmmac_suspend(struct device *dev)
> }
> mutex_unlock(&priv->lock);
>
> + rtnl_lock();
> + phylink_stop(priv->phylink);
> + rtnl_unlock();
> +
> priv->speed = SPEED_UNKNOWN;
> return 0;
> }
> @@ -5448,6 +5451,12 @@ int stmmac_resume(struct device *dev)
> pinctrl_pm_select_default_state(priv->device);
> if (priv->plat->clk_ptp_ref)
> clk_prepare_enable(priv->plat->clk_ptp_ref);
> +
> + rtnl_lock();
> + /* We may have called phylink_speed_down before */
> + phylink_speed_up(priv->phylink);
> + rtnl_unlock();
> +
> /* reset the phy so that it's ready */
> if (priv->mii && priv->mdio_rst_after_resume)
> stmmac_mdio_reset(priv->mii);
> @@ -5461,13 +5470,9 @@ int stmmac_resume(struct device *dev)
> return ret;
> }
>
> - if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> - rtnl_lock();
> - phylink_start(priv->phylink);
> - /* We may have called phylink_speed_down before */
> - phylink_speed_up(priv->phylink);
> - rtnl_unlock();
> - }
> + rtnl_lock();
> + phylink_start(priv->phylink);
> + rtnl_unlock();
>
> rtnl_lock();
> mutex_lock(&priv->lock);
>
>
>> phylink_stop() (and therefore phy_stop()) should be called even if WoL is active
>> to shut down this state reporting, as other network drivers do.
>
> Ok, you mean that phylink_stop() also should be called even if WoL is active, I would look in this direction since
> you are a professional.
>
> Thanks.
>
> Best Regards,
> Joakim Zhang
>
Powered by blists - more mailing lists