lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 6 Sep 2021 02:21:46 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     Russell King <linux@...linux.org.uk>
CC:     Andrew Lunn <andrew@...n.ch>, Vladimir Oltean <olteanv@...il.com>,
        "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
        "alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
        "joabreu@...opsys.com" <joabreu@...opsys.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "mcoquelin.stm32@...il.com" <mcoquelin.stm32@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "hkallweit1@...il.com" <hkallweit1@...il.com>,
        dl-linux-imx <linux-imx@....com>
Subject: RE: [PATCH] net: stmmac: fix MAC not working when system resume back
 with WoL enabled


Hi Russell,

> -----Original Message-----
> From: Russell King <linux@...linux.org.uk>
> Sent: 2021年9月3日 20:01
> To: Joakim Zhang <qiangqing.zhang@....com>
> Cc: Andrew Lunn <andrew@...n.ch>; Vladimir Oltean <olteanv@...il.com>;
> peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> joabreu@...opsys.com; davem@...emloft.net; kuba@...nel.org;
> mcoquelin.stm32@...il.com; netdev@...r.kernel.org; f.fainelli@...il.com;
> hkallweit1@...il.com; dl-linux-imx <linux-imx@....com>
> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume
> back with WoL enabled
> 
> On Fri, Sep 03, 2021 at 11:04:57AM +0000, Joakim Zhang wrote:
> >
> > Hi Russell,
> >
> > [...]
> > > > > > > -----Original Message-----
> > > > > > > From: Andrew Lunn <andrew@...n.ch>
> > > > > > > Sent: 2021年9月2日 20:24
> > > > > > > To: Joakim Zhang <qiangqing.zhang@....com>
> > > > > > > Cc: Russell King <linux@...linux.org.uk>; Vladimir Oltean
> > > > > > > <olteanv@...il.com>; peppe.cavallaro@...com;
> > > > > > > alexandre.torgue@...s.st.com; joabreu@...opsys.com;
> > > > > > > davem@...emloft.net; kuba@...nel.org;
> > > mcoquelin.stm32@...il.com;
> > > > > > > netdev@...r.kernel.org; f.fainelli@...il.com;
> > > > > > > hkallweit1@...il.com; dl-linux-imx <linux-imx@....com>
> > > > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when
> > > > > > > system resume back with WoL enabled
> > > > > > >
> > > > > > > > Emm, @andrew@...n.ch, Andrew is much familiar with FEC,
> > > > > > > > and PHY maintainers, Could you please help put insights here if
> possible?
> > > > > > >
> > > > > > > All the boards i have either have an Ethernet Switch
> > > > > > > connected to the MAC, or a Micrel PHY. None are setup for
> > > > > > > WoL, since it is not used in the use case of these boards.
> > > > > > >
> > > > > > > I think you need to scatter some printk() in various places
> > > > > > > to confirm what is going on. Where is the WoL implemented:
> > > > > > > MAC or PHY, what is suspended or not, etc.
> > > > > >
> > > > > > Thanks Andrew, Russell,
> > > > > >
> > > > > > I confirmed FEC is MAC-based WoL, and PHY is active when
> > > > > > system
> > > > > suspended if MAC-based WoL is active.
> > > > > > I scatter printk() in both phy_device.c and realtek.c phy
> > > > > > driver to debug this
> > > > > for both WoL active and inactive case.
> > > > > >
> > > > > > When MAC-based WoL is active, phy_suspend() is the last point
> > > > > > to actually
> > > > > suspend the PHY, you can see that,
> > > > > > 	phy_ethtool_get_wol(phydev, &wol);
> > > > > > 	if (wol.wolopts || (netdev && netdev->wol_enabled))
> > > > > > 		return -EBUSY;
> > > > > >
> > > > > > Here, netdev is true and netdev->wol_enabled is ture
> > > > > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =
> > > > > > !!wol.wolopts;) So that phydev->suspend() would not be called,
> > > > > > PHY is active
> > > > > after system suspended. PHY can receive packets and pass to MAC,
> > > > > MAC is responsible for detecting magic packets then generate a
> > > > > wakeup interrupt. So all is fine for FEC, and the behavior is clear.
> > > > >
> > > > > What happens on resume with FEC?
> > > >
> > > > Since we call phy_stop() in fec_suspend(), the link is down, but
> > > > the PHY is active, after receiving magic packets, the system
> > > > resume back; In fec_resume(), after restart/init FEC, we call
> > > > phy_start() to let link up, then
> > > all is going well.
> > >
> > > ... but the link never went down! So I don't understand the last point.
> >
> > Sorry, what the meaning of "the link never went down"? How do you
> > define the link is down? May be I have not get your original point correctly.
> 
> If the link goes down, connectivity between the MAC and the outside world is
> lost - whether that be the link between the MAC and PHY, or PHY and the
> outside world. That's my definition of "link down".
> 
> However, if the MAC is still alive and receiving packets, even for Wake-on-Lan
> purposes, from the outside world then the link can not be down, it must be
> operational and therefore it must be in the "up"
> state.
> 
> I'm talking about the physical state of the link - "up" meaning capable of
> passing packets to and from the MAC, "down" meaning incapable of passing
> packets.

Yes, actually what you described is physical state, what I mean is a soft state.

> > At my side, with MAC-based WoL is active, FEC calls phy_stop() in
> > fec_suspend(), then fec_enet_adjust_link() is called, further
> > fec_stop() is called, FEC only keep necessary receive logic active to
> > service WoL. This is not the link went down? At least I see the log "
> > fec 30be0000.ethernet eth0: Link is Down".
> 
> It looks like calling phy_stop() will force a link-down event to be reported from
> phylib. As I say above though, really, this doesn't affect the physical state of the
> link, because the link has to be up for the WoL packets to be received by the
> MAC.

Yes, both MAC and PHY are active for MAC-based WoL.

> What I don't like about that is that we're saying that the link is down, whereas
> the physical link is actually still up. This is going to make network drivers
> implementation of mac_link_down() rather yucky, especially ones that force the
> physical link down at the MAC end when operating in PHY mode and they see a
> call to mac_link_down() (which they do to stop packet reception.) There's no
> way for them to know whether mac_link_down() is a result of a real physical
> link down event, or whether this is a "soft" link down event as you're describing
> at a suspend.

As I think, if WoL is active then this is a soft link down events; Instead, if WoL is
inactive this is a physical link down events.

> Then there's the issue that some network drivers _must_ see a
> mac_link_down() call to force the link down prior to reconfiguring the link (since
> settings are not allowed to be changed while the physical link is up.) So we start
> to destroy the guarantee that
> mac_link_down() and mac_link_up() will be properly ordered.
> 
> Damn it.
> 
> > > There's a few more questions:
> > > 1. Since the state at this point will be that netdev and phylink believe
> > >    the link to still be up, should phylink_suspend() force a
> > >    netif_carrier_off() event to stop the netdev transmit watchdog - I
> > >    think it ought to, even though the link will actually remain up.
> >
> > Agree.
> 
> Note that phylib in the FEC case will do this just before calling the adjust_link
> function already.
> 
> With phylink, we can make that silent, and I think it should be silent because, as
> I describe above, the physical link isn't actually going down.

Yes.

> > > 2. Should we call mac_link_down() prior to the major reconfig - I think
> > >    we should to keep the mac_link_down()/mac_link_up() calls balanced
> > >    (as we do already guarantee.) Will that do any harm for stmmac if we
> > >    were to call mac_link_down() as a result of a call to
> > >    phylink_resume() ?
> >
> > For STMMAC, I think it's safe, since we calling phylink_resume() before
> re-config MAC.
> > But we design this for common usage, other MAC drivers may call this
> > at the end of resume path, but I think it also safe, like we unplug
> > then plug the cable. However, it will print the LINK DOWN then LINK UP log,
> which is very strange when system resume back. Is there any better solution?
> 
> I think at this point, we should just print the state of the link at resume, which
> should basically be only a "link down" if the link is now physically down at
> resume.
> 
> One thing worries me though - what happens if the link parameters change
> while the system is suspended. The MAC won't be updated with the new link
> parameters. What happens on resume?

There is a state machine in phylink, then this link change events would be captured later
from reschedule the machine, right? If so, this seems not a critical issue. MAC just not working
for a while.

Yes, I am also curious if link parameters changed during system suspended, how phylib or phylink
react when system resume back. Do you know some details?

> > > Why is it different from the .ndo_stop/.ndo_open case, where the PHY
> > > may have been suspended by the actions of .ndo_stop?
> >
> > It's a good question. PHY will suspend by the actions od .ndo_stop,
> > but it will resume before we config MAC. Please see stmmac_open(),
> > stmmac_init_phy()->phylink_of_phy_connect()
> > -> phy_attach_direct(): phy_resume(phydev), where PHY will be resume
> backed.
> 
> Ah, I forgot FEC does the PHY connect/disconnect at open/stop. Thanks.

Best Regards,
Joakim Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ