lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DB8PR04MB679518228AB7B2C5CD47A1B3E6CF9@DB8PR04MB6795.eurprd04.prod.outlook.com>
Date:   Fri, 3 Sep 2021 08:39:23 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     Russell King <linux@...linux.org.uk>
CC:     Andrew Lunn <andrew@...n.ch>, Vladimir Oltean <olteanv@...il.com>,
        "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
        "alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
        "joabreu@...opsys.com" <joabreu@...opsys.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "mcoquelin.stm32@...il.com" <mcoquelin.stm32@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "hkallweit1@...il.com" <hkallweit1@...il.com>,
        dl-linux-imx <linux-imx@....com>
Subject: RE: [PATCH] net: stmmac: fix MAC not working when system resume back
 with WoL enabled


Hi Russell,

> -----Original Message-----
> From: Russell King <linux@...linux.org.uk>
> Sent: 2021年9月3日 16:02
> To: Joakim Zhang <qiangqing.zhang@....com>
> Cc: Andrew Lunn <andrew@...n.ch>; Vladimir Oltean <olteanv@...il.com>;
> peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> joabreu@...opsys.com; davem@...emloft.net; kuba@...nel.org;
> mcoquelin.stm32@...il.com; netdev@...r.kernel.org; f.fainelli@...il.com;
> hkallweit1@...il.com; dl-linux-imx <linux-imx@....com>
> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume
> back with WoL enabled
> 
> On Fri, Sep 03, 2021 at 06:51:09AM +0000, Joakim Zhang wrote:
> >
> > > -----Original Message-----
> > > From: Andrew Lunn <andrew@...n.ch>
> > > Sent: 2021年9月2日 20:24
> > > To: Joakim Zhang <qiangqing.zhang@....com>
> > > Cc: Russell King <linux@...linux.org.uk>; Vladimir Oltean
> > > <olteanv@...il.com>; peppe.cavallaro@...com;
> > > alexandre.torgue@...s.st.com; joabreu@...opsys.com;
> > > davem@...emloft.net; kuba@...nel.org; mcoquelin.stm32@...il.com;
> > > netdev@...r.kernel.org; f.fainelli@...il.com; hkallweit1@...il.com;
> > > dl-linux-imx <linux-imx@....com>
> > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system
> > > resume back with WoL enabled
> > >
> > > > Emm, @andrew@...n.ch, Andrew is much familiar with FEC, and PHY
> > > > maintainers, Could you please help put insights here if possible?
> > >
> > > All the boards i have either have an Ethernet Switch connected to
> > > the MAC, or a Micrel PHY. None are setup for WoL, since it is not
> > > used in the use case of these boards.
> > >
> > > I think you need to scatter some printk() in various places to
> > > confirm what is going on. Where is the WoL implemented: MAC or PHY,
> > > what is suspended or not, etc.
> >
> > Thanks Andrew, Russell,
> >
> > I confirmed FEC is MAC-based WoL, and PHY is active when system
> suspended if MAC-based WoL is active.
> > I scatter printk() in both phy_device.c and realtek.c phy driver to debug this
> for both WoL active and inactive case.
> >
> > When MAC-based WoL is active, phy_suspend() is the last point to actually
> suspend the PHY, you can see that,
> > 	phy_ethtool_get_wol(phydev, &wol);
> > 	if (wol.wolopts || (netdev && netdev->wol_enabled))
> > 		return -EBUSY;
> >
> > Here, netdev is true and netdev->wol_enabled is ture
> > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =
> > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY is active
> after system suspended. PHY can receive packets and pass to MAC, MAC is
> responsible for detecting magic packets then generate a wakeup interrupt. So
> all is fine for FEC, and the behavior is clear.
> 
> What happens on resume with FEC?

Since we call phy_stop() in fec_suspend(), the link is down, but the PHY is active, after receiving
magic packets, the system resume back; In fec_resume(), after restart/init FEC, we call phy_start()
to let link up, then all is going well.
 
> > For STMMAC, when MAC-based WoL is active, according to the current
> > implementation, only call phylink_mac_change()=false, PHY would be
> > active, so PHY can receive packets then pass to MAC, MAC ignore packets
> except magic packets. System can be waked up successfully.
> >
> > The issue is that phylink_mac_change()=false only notify a phylink of
> > a change in MAC state, as we analyzed before, PHY would link up again
> > before system suspended, which lead to .mac_link_up can't be called when
> system resume back. Unfortunately, all MAC configurations are in
> stmmac_mac_link_up(), as a result, MAC has not been initialized correctly
> when system resume back, so that it can't work any longer.
> 
> Oh, I thought your problem was that the system didn't wake up.
> 
> In any case, remove the calls to phylink_mac_change() from the suspend and
> resume functions, they are completely _incorrect_.

Ok, I will do that.

> > Intend to fix this obvious breakage, I did some work:
> > Removing phylink_mac_change() (Russell said it's for MLO_AN_INBAND,
> > but we have a MLO_AN_PHY) from suspend/resume path, then adding
> > phylink_stop() in suspend, phylink_start() in resume() also for WoL active path.
> I found remote magic packets can't wake up the system, I firstly suspect PHY
> may be suspended. After further debug, I confirm that PHY is active, and
> stmmac_pmt() is correctly configured.
> 
> As I've said a few times now, if the MAC is doing the wakeup, you need the PHY
> to MAC link to be up, so you should _not_ call
> phylink_stop() and phylink_start() from the suspend/resume functions because
> they will take the link down.

Yes, I recall you said this before. Is it the requirement for phylink?
For FEC, we call phy_stop() when suspend, and phy_start() when resume, with MAC is doing
the wakeup.

> Maybe I should provide phylink_suspend()/phylink_resume() which look at the
> netdev state just like phylib does, and conditionally call
> phylink_stop() and phylink_start() so driver authors don't have to consider this.
> 
> Something like:
> 
> /**
>  *...
>  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan  */
> void phylink_suspend(struct phylink *phylink, bool mac_wol) {
> 	ASSERT_RTNL();
> 
> 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
> 		phylink_stop(phylink);
> }
> 
> /**
>  *...
>  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan
>  *
>  * @mac_wol must have the same value as passed previously to
>  * phylink_suspend().
>  */
> void phylink_resume(struct phylink *phylink, bool mac_wol) {
> 	ASSERT_RTNL();
> 
> 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
> 		phylink_start(phylink);
> }

That's great!!! MAC driver authors don't need to distinguish the different cases.

> > The conclusion is that, as long as we call phylink_stop() for WoL
> > active in suspend(), then system can't be waked up any longer, and the
> > PHY situation is active. This let me recall what Russell mentioned in this
> thread, if we need bring MAC link up with phylink framework to let MAC can
> see traffic from PHY when MAC-based WoL is active?
> >
> > Now, I don't know where I can further dig into this issue, if you have any
> advice please share with me , thanks in advance.
> 
> So my question now is: as the MAC needs to be alive while the system is
> suspended, that implies that it has been configured to receive packets. When
> the system resumes, why exactly doesn't the MAC continue to work? Does the
> MAC get reset after the system comes out of resume and lose all of its
> configuration?

Yes, as I described in commit message, when STMMAC resume back, either WoL is active or not,
it reset the hardware then reconfig the MAC.
stmmac_resume()->stmmac_hw_setup()->stmmac_init_dma_engine()...

> Reading what stmmac_resume() does, it seems that may well be the case, or if
> not, the actions of stmmac_resume() ends up reprogramming a great deal of
> the device setup. If this is the case, then yes, we need phylink to be triggered
> to reconfigure the link - which we could do in
> phylink_resume() if mac_wol was active.
> 
> While reading stmmac_resume(), I have to question the placement of this code
> block:
> 
>         if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
>                 rtnl_lock();
>                 phylink_start(priv->phylink);
>                 /* We may have called phylink_speed_down before */
>                 phylink_speed_up(priv->phylink);
>                 rtnl_unlock();
>         }
> 
> in the sequence there - phylink_start() should be called when you're ready for
> the link to come up - in other words, when you're ready to start seeing packets
> arrive at the MAC's interface. However, the code following is clearing and
> resetting up queues, restoring receive modes, setting up the hardware, and
> restoring the vlan filtering.
> Surely all that should happen before calling phylink_start(), much like it already
> does in stmmac_open() ?

There is a story here, SNPS EQOS IP need PHY provides RXC clock for MAC's receive
logic, so we need phylink_start() to bring PHY link up, that make PHY resume back,
PHY could stop RXC clock when in suspended state. This is the reason why calling phylink_start()
before re-config MAC.

Best Regards,
Joakim Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ