lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 3 Sep 2021 11:04:57 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     Russell King <linux@...linux.org.uk>
CC:     Andrew Lunn <andrew@...n.ch>, Vladimir Oltean <olteanv@...il.com>,
        "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
        "alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
        "joabreu@...opsys.com" <joabreu@...opsys.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "mcoquelin.stm32@...il.com" <mcoquelin.stm32@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "hkallweit1@...il.com" <hkallweit1@...il.com>,
        dl-linux-imx <linux-imx@....com>
Subject: RE: [PATCH] net: stmmac: fix MAC not working when system resume back
 with WoL enabled


Hi Russell,

[...]
> > > > > -----Original Message-----
> > > > > From: Andrew Lunn <andrew@...n.ch>
> > > > > Sent: 2021年9月2日 20:24
> > > > > To: Joakim Zhang <qiangqing.zhang@....com>
> > > > > Cc: Russell King <linux@...linux.org.uk>; Vladimir Oltean
> > > > > <olteanv@...il.com>; peppe.cavallaro@...com;
> > > > > alexandre.torgue@...s.st.com; joabreu@...opsys.com;
> > > > > davem@...emloft.net; kuba@...nel.org;
> mcoquelin.stm32@...il.com;
> > > > > netdev@...r.kernel.org; f.fainelli@...il.com;
> > > > > hkallweit1@...il.com; dl-linux-imx <linux-imx@....com>
> > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when
> > > > > system resume back with WoL enabled
> > > > >
> > > > > > Emm, @andrew@...n.ch, Andrew is much familiar with FEC, and
> > > > > > PHY maintainers, Could you please help put insights here if possible?
> > > > >
> > > > > All the boards i have either have an Ethernet Switch connected
> > > > > to the MAC, or a Micrel PHY. None are setup for WoL, since it is
> > > > > not used in the use case of these boards.
> > > > >
> > > > > I think you need to scatter some printk() in various places to
> > > > > confirm what is going on. Where is the WoL implemented: MAC or
> > > > > PHY, what is suspended or not, etc.
> > > >
> > > > Thanks Andrew, Russell,
> > > >
> > > > I confirmed FEC is MAC-based WoL, and PHY is active when system
> > > suspended if MAC-based WoL is active.
> > > > I scatter printk() in both phy_device.c and realtek.c phy driver
> > > > to debug this
> > > for both WoL active and inactive case.
> > > >
> > > > When MAC-based WoL is active, phy_suspend() is the last point to
> > > > actually
> > > suspend the PHY, you can see that,
> > > > 	phy_ethtool_get_wol(phydev, &wol);
> > > > 	if (wol.wolopts || (netdev && netdev->wol_enabled))
> > > > 		return -EBUSY;
> > > >
> > > > Here, netdev is true and netdev->wol_enabled is ture
> > > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =
> > > > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY
> > > > is active
> > > after system suspended. PHY can receive packets and pass to MAC, MAC
> > > is responsible for detecting magic packets then generate a wakeup
> > > interrupt. So all is fine for FEC, and the behavior is clear.
> > >
> > > What happens on resume with FEC?
> >
> > Since we call phy_stop() in fec_suspend(), the link is down, but the
> > PHY is active, after receiving magic packets, the system resume back;
> > In fec_resume(), after restart/init FEC, we call phy_start() to let link up, then
> all is going well.
> 
> ... but the link never went down! So I don't understand the last point.

Sorry, what the meaning of "the link never went down"? How do you define the link is down?
May be I have not get your original point correctly.

At my side, with MAC-based WoL is active, FEC calls phy_stop() in fec_suspend(), then fec_enet_adjust_link()
is called, further fec_stop() is called, FEC only keep necessary receive logic active to service WoL. This is not
the link went down? At least I see the log " fec 30be0000.ethernet eth0: Link is Down".

> > > > For STMMAC, when MAC-based WoL is active, according to the current
> > > > implementation, only call phylink_mac_change()=false, PHY would be
> > > > active, so PHY can receive packets then pass to MAC, MAC ignore
> > > > packets
> > > except magic packets. System can be waked up successfully.
> > > >
> > > > The issue is that phylink_mac_change()=false only notify a phylink
> > > > of a change in MAC state, as we analyzed before, PHY would link up
> > > > again before system suspended, which lead to .mac_link_up can't be
> > > > called when
> > > system resume back. Unfortunately, all MAC configurations are in
> > > stmmac_mac_link_up(), as a result, MAC has not been initialized
> > > correctly when system resume back, so that it can't work any longer.
> > >
> > > Oh, I thought your problem was that the system didn't wake up.
> > >
> > > In any case, remove the calls to phylink_mac_change() from the
> > > suspend and resume functions, they are completely _incorrect_.
> >
> > Ok, I will do that.
> >
> > > > Intend to fix this obvious breakage, I did some work:
> > > > Removing phylink_mac_change() (Russell said it's for
> > > > MLO_AN_INBAND, but we have a MLO_AN_PHY) from suspend/resume
> path,
> > > > then adding
> > > > phylink_stop() in suspend, phylink_start() in resume() also for WoL active
> path.
> > > I found remote magic packets can't wake up the system, I firstly
> > > suspect PHY may be suspended. After further debug, I confirm that
> > > PHY is active, and
> > > stmmac_pmt() is correctly configured.
> > >
> > > As I've said a few times now, if the MAC is doing the wakeup, you
> > > need the PHY to MAC link to be up, so you should _not_ call
> > > phylink_stop() and phylink_start() from the suspend/resume functions
> > > because they will take the link down.
> >
> > Yes, I recall you said this before. Is it the requirement for phylink?
> > For FEC, we call phy_stop() when suspend, and phy_start() when resume,
> > with MAC is doing the wakeup.
> 
> phylink_stop() will synchronously force the phy/mac link down if it wasn't
> already down, and it'll do this by calling the mac_link_down() method.
> 
> phylink_start() will remove the force-down that phylink_stop() places, and will
> re-resolve the link. You will get a "major reconfiguration"
> event, and if the link is up, a mac_link_up() call.
> 
> These are primarily designed to be called from the .ndo_open and .ndo_stop
> calls (as their kerneldoc mentions) but have found their way into
> suspend/resume methods.
> 
> If a MAC is being suspended - as in powered down - you definitely want to bring
> it down to a safe state where link events are not going to affect the MAC.
> Calling phylink_stop() will do that. On resume, you want to reconfigure and
> allow the MAC to receive link events, and calling phylink_start() will do that.
> 
> However, if the MAC is not being suspended because you want WoL to work,
> then you need the PHY/MAC link _not_ to be brought down so the MAC can
> receive packets and examine them to see if it should wake up.
> Phylink does not really cater for that case - it wasn't on my radar as I don't have
> any modern systems that support suspend/resume, much less MAC based
> WoL.

Great thanks, for your detailed explanation. It means phylink has not prepared for WoL yet.
I am happy to support testing since you lack of the environment.

> > > Maybe I should provide phylink_suspend()/phylink_resume() which look
> > > at the netdev state just like phylib does, and conditionally call
> > > phylink_stop() and phylink_start() so driver authors don't have to consider
> this.
> > >
> > > Something like:
> > >
> > > /**
> > >  *...
> > >  * @mac_wol: true if the MAC needs to receive packets for
> > > Wake-on-Lan  */ void phylink_suspend(struct phylink *phylink, bool
> mac_wol) {
> > > 	ASSERT_RTNL();
> > >
> > > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
> > > 		phylink_stop(phylink);
> > > }
> > >
> > > /**
> > >  *...
> > >  * @mac_wol: true if the MAC needs to receive packets for
> > > Wake-on-Lan
> > >  *
> > >  * @mac_wol must have the same value as passed previously to
> > >  * phylink_suspend().
> > >  */
> > > void phylink_resume(struct phylink *phylink, bool mac_wol) {
> > > 	ASSERT_RTNL();
> > >
> > > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
> > > 		phylink_start(phylink);
> > > }
> >
> > That's great!!! MAC driver authors don't need to distinguish the different
> cases.
> >
> > > > The conclusion is that, as long as we call phylink_stop() for WoL
> > > > active in suspend(), then system can't be waked up any longer, and
> > > > the PHY situation is active. This let me recall what Russell
> > > > mentioned in this
> > > thread, if we need bring MAC link up with phylink framework to let
> > > MAC can see traffic from PHY when MAC-based WoL is active?
> > > >
> > > > Now, I don't know where I can further dig into this issue, if you
> > > > have any
> > > advice please share with me , thanks in advance.
> > >
> > > So my question now is: as the MAC needs to be alive while the system
> > > is suspended, that implies that it has been configured to receive
> > > packets. When the system resumes, why exactly doesn't the MAC
> > > continue to work? Does the MAC get reset after the system comes out
> > > of resume and lose all of its configuration?
> >
> > Yes, as I described in commit message, when STMMAC resume back, either
> > WoL is active or not, it reset the hardware then reconfig the MAC.
> > stmmac_resume()->stmmac_hw_setup()->stmmac_init_dma_engine()...
> 
> Okay, so that means I need to make phylink_resume() trigger a major config if
> we are using WoL.
> 
> There's a few more questions:
> 1. Since the state at this point will be that netdev and phylink believe
>    the link to still be up, should phylink_suspend() force a
>    netif_carrier_off() event to stop the netdev transmit watchdog - I
>    think it ought to, even though the link will actually remain up.

Agree.

> 2. Should we call mac_link_down() prior to the major reconfig - I think
>    we should to keep the mac_link_down()/mac_link_up() calls balanced
>    (as we do already guarantee.) Will that do any harm for stmmac if we
>    were to call mac_link_down() as a result of a call to
>    phylink_resume() ?

For STMMAC, I think it's safe, since we calling phylink_resume() before re-config MAC.
But we design this for common usage, other MAC drivers may call this at the end of resume
path, but I think it also safe, like we unplug then plug the cable. However, it will print the LINK DOWN
then LINK UP log, which is very strange when system resume back. Is there any better solution?

> > > Reading what stmmac_resume() does, it seems that may well be the
> > > case, or if not, the actions of stmmac_resume() ends up
> > > reprogramming a great deal of the device setup. If this is the case,
> > > then yes, we need phylink to be triggered to reconfigure the link -
> > > which we could do in
> > > phylink_resume() if mac_wol was active.
> > >
> > > While reading stmmac_resume(), I have to question the placement of
> > > this code
> > > block:
> > >
> > >         if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> > >                 rtnl_lock();
> > >                 phylink_start(priv->phylink);
> > >                 /* We may have called phylink_speed_down before */
> > >                 phylink_speed_up(priv->phylink);
> > >                 rtnl_unlock();
> > >         }
> > >
> > > in the sequence there - phylink_start() should be called when you're
> > > ready for the link to come up - in other words, when you're ready to
> > > start seeing packets arrive at the MAC's interface. However, the
> > > code following is clearing and resetting up queues, restoring
> > > receive modes, setting up the hardware, and restoring the vlan filtering.
> > > Surely all that should happen before calling phylink_start(), much
> > > like it already does in stmmac_open() ?
> >
> > There is a story here, SNPS EQOS IP need PHY provides RXC clock for
> > MAC's receive logic, so we need phylink_start() to bring PHY link up,
> > that make PHY resume back, PHY could stop RXC clock when in suspended
> > state. This is the reason why calling phylink_start() before re-config MAC.
> 
> Why is it different from the .ndo_stop/.ndo_open case, where the PHY may
> have been suspended by the actions of .ndo_stop?

It's a good question. PHY will suspend by the actions od .ndo_stop, but it will resume
before we config MAC. Please see stmmac_open(), stmmac_init_phy()->phylink_of_phy_connect()
-> phy_attach_direct(): phy_resume(phydev), where PHY will be resume backed.

Best Regards,
Joakim Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ