[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210903120127.GW22278@shell.armlinux.org.uk>
Date: Fri, 3 Sep 2021 13:01:27 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Joakim Zhang <qiangqing.zhang@....com>
Cc: Andrew Lunn <andrew@...n.ch>, Vladimir Oltean <olteanv@...il.com>,
"peppe.cavallaro@...com" <peppe.cavallaro@...com>,
"alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
"joabreu@...opsys.com" <joabreu@...opsys.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>,
"mcoquelin.stm32@...il.com" <mcoquelin.stm32@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"f.fainelli@...il.com" <f.fainelli@...il.com>,
"hkallweit1@...il.com" <hkallweit1@...il.com>,
dl-linux-imx <linux-imx@....com>
Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume back
with WoL enabled
On Fri, Sep 03, 2021 at 11:04:57AM +0000, Joakim Zhang wrote:
>
> Hi Russell,
>
> [...]
> > > > > > -----Original Message-----
> > > > > > From: Andrew Lunn <andrew@...n.ch>
> > > > > > Sent: 2021年9月2日 20:24
> > > > > > To: Joakim Zhang <qiangqing.zhang@....com>
> > > > > > Cc: Russell King <linux@...linux.org.uk>; Vladimir Oltean
> > > > > > <olteanv@...il.com>; peppe.cavallaro@...com;
> > > > > > alexandre.torgue@...s.st.com; joabreu@...opsys.com;
> > > > > > davem@...emloft.net; kuba@...nel.org;
> > mcoquelin.stm32@...il.com;
> > > > > > netdev@...r.kernel.org; f.fainelli@...il.com;
> > > > > > hkallweit1@...il.com; dl-linux-imx <linux-imx@....com>
> > > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when
> > > > > > system resume back with WoL enabled
> > > > > >
> > > > > > > Emm, @andrew@...n.ch, Andrew is much familiar with FEC, and
> > > > > > > PHY maintainers, Could you please help put insights here if possible?
> > > > > >
> > > > > > All the boards i have either have an Ethernet Switch connected
> > > > > > to the MAC, or a Micrel PHY. None are setup for WoL, since it is
> > > > > > not used in the use case of these boards.
> > > > > >
> > > > > > I think you need to scatter some printk() in various places to
> > > > > > confirm what is going on. Where is the WoL implemented: MAC or
> > > > > > PHY, what is suspended or not, etc.
> > > > >
> > > > > Thanks Andrew, Russell,
> > > > >
> > > > > I confirmed FEC is MAC-based WoL, and PHY is active when system
> > > > suspended if MAC-based WoL is active.
> > > > > I scatter printk() in both phy_device.c and realtek.c phy driver
> > > > > to debug this
> > > > for both WoL active and inactive case.
> > > > >
> > > > > When MAC-based WoL is active, phy_suspend() is the last point to
> > > > > actually
> > > > suspend the PHY, you can see that,
> > > > > phy_ethtool_get_wol(phydev, &wol);
> > > > > if (wol.wolopts || (netdev && netdev->wol_enabled))
> > > > > return -EBUSY;
> > > > >
> > > > > Here, netdev is true and netdev->wol_enabled is ture
> > > > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =
> > > > > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY
> > > > > is active
> > > > after system suspended. PHY can receive packets and pass to MAC, MAC
> > > > is responsible for detecting magic packets then generate a wakeup
> > > > interrupt. So all is fine for FEC, and the behavior is clear.
> > > >
> > > > What happens on resume with FEC?
> > >
> > > Since we call phy_stop() in fec_suspend(), the link is down, but the
> > > PHY is active, after receiving magic packets, the system resume back;
> > > In fec_resume(), after restart/init FEC, we call phy_start() to let link up, then
> > all is going well.
> >
> > ... but the link never went down! So I don't understand the last point.
>
> Sorry, what the meaning of "the link never went down"? How do you define
> the link is down? May be I have not get your original point correctly.
If the link goes down, connectivity between the MAC and the outside
world is lost - whether that be the link between the MAC and PHY, or
PHY and the outside world. That's my definition of "link down".
However, if the MAC is still alive and receiving packets, even for
Wake-on-Lan purposes, from the outside world then the link can not be
down, it must be operational and therefore it must be in the "up"
state.
I'm talking about the physical state of the link - "up" meaning capable
of passing packets to and from the MAC, "down" meaning incapable of
passing packets.
> At my side, with MAC-based WoL is active, FEC calls phy_stop() in
> fec_suspend(), then fec_enet_adjust_link() is called, further
> fec_stop() is called, FEC only keep necessary receive logic active
> to service WoL. This is not the link went down? At least I see the
> log " fec 30be0000.ethernet eth0: Link is Down".
It looks like calling phy_stop() will force a link-down event to be
reported from phylib. As I say above though, really, this doesn't
affect the physical state of the link, because the link has to be
up for the WoL packets to be received by the MAC.
What I don't like about that is that we're saying that the link is
down, whereas the physical link is actually still up. This is going
to make network drivers implementation of mac_link_down() rather
yucky, especially ones that force the physical link down at the MAC
end when operating in PHY mode and they see a call to mac_link_down()
(which they do to stop packet reception.) There's no way for them to
know whether mac_link_down() is a result of a real physical link down
event, or whether this is a "soft" link down event as you're describing
at a suspend.
Then there's the issue that some network drivers _must_ see a
mac_link_down() call to force the link down prior to reconfiguring
the link (since settings are not allowed to be changed while the
physical link is up.) So we start to destroy the guarantee that
mac_link_down() and mac_link_up() will be properly ordered.
Damn it.
> > There's a few more questions:
> > 1. Since the state at this point will be that netdev and phylink believe
> > the link to still be up, should phylink_suspend() force a
> > netif_carrier_off() event to stop the netdev transmit watchdog - I
> > think it ought to, even though the link will actually remain up.
>
> Agree.
Note that phylib in the FEC case will do this just before calling
the adjust_link function already.
With phylink, we can make that silent, and I think it should be silent
because, as I describe above, the physical link isn't actually going
down.
> > 2. Should we call mac_link_down() prior to the major reconfig - I think
> > we should to keep the mac_link_down()/mac_link_up() calls balanced
> > (as we do already guarantee.) Will that do any harm for stmmac if we
> > were to call mac_link_down() as a result of a call to
> > phylink_resume() ?
>
> For STMMAC, I think it's safe, since we calling phylink_resume() before re-config MAC.
> But we design this for common usage, other MAC drivers may call this at the end of resume
> path, but I think it also safe, like we unplug then plug the cable. However, it will print the LINK DOWN
> then LINK UP log, which is very strange when system resume back. Is there any better solution?
I think at this point, we should just print the state of the link at
resume, which should basically be only a "link down" if the link is
now physically down at resume.
One thing worries me though - what happens if the link parameters
change while the system is suspended. The MAC won't be updated with
the new link parameters. What happens on resume?
> > Why is it different from the .ndo_stop/.ndo_open case, where the PHY may
> > have been suspended by the actions of .ndo_stop?
>
> It's a good question. PHY will suspend by the actions od .ndo_stop, but it will resume
> before we config MAC. Please see stmmac_open(), stmmac_init_phy()->phylink_of_phy_connect()
> -> phy_attach_direct(): phy_resume(phydev), where PHY will be resume backed.
Ah, I forgot FEC does the PHY connect/disconnect at open/stop. Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists