[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-1tiW9zjcoFkhwc@shell.armlinux.org.uk>
Date: Wed, 2 Apr 2025 18:02:01 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Florian Fainelli <florian.fainelli@...adcom.com>,
Wei Fang <wei.fang@....com>
Subject: Re: [PATCH v2 net 2/2] net: phy: allow MDIO bus PM ops to start/stop
state machine for phylink-controlled PHY
On Wed, Apr 02, 2025 at 06:08:59PM +0300, Vladimir Oltean wrote:
> DSA has 2 kinds of drivers:
>
> 1. Those who call dsa_switch_suspend() and dsa_switch_resume() from
> their device PM ops: qca8k-8xxx, bcm_sf2, microchip ksz
> 2. Those who don't: all others. The above methods should be optional.
>
> For type 1, dsa_switch_suspend() calls dsa_user_suspend() -> phylink_stop(),
> and dsa_switch_resume() calls dsa_user_resume() -> phylink_start().
> These seem good candidates for setting mac_managed_pm = true because
> that is essentially its definition, but that does not seem to be the
> biggest problem for now, and is not what this change focuses on.
>
> Talking strictly about the 2nd category of drivers here, I have noticed
> that these also trigger the
>
> WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
> phydev->state != PHY_UP);
>
> from mdio_bus_phy_resume(), because the PHY state machine is running.
>
> It's running as a result of a previous dsa_user_open() -> ... ->
> phylink_start() -> phy_start(), and AFAICS, mdio_bus_phy_suspend() was
> supposed to have called phy_stop_machine(), but it didn't. So this is
> why the PHY is in state PHY_NOLINK by the time mdio_bus_phy_resume()
> runs.
>
> mdio_bus_phy_suspend() did not call phy_stop_machine() because for
> phylink, the phydev->adjust_link function pointer is NULL. This seems a
> technicality introduced by commit fddd91016d16 ("phylib: fix PAL state
> machine restart on resume"). That commit was written before phylink
> existed, and was intended to avoid crashing with consumer drivers which
> don't use the PHY state machine - phylink does.
I think this is a historical bug (as I believe I previously mentioned,
suspend/resume support wasn't tested with phylink as none of the
platforms it was developed against had suspend/resume support.)
phylink should be no differnet from a MAC that uses phylib and supplies
an adjust_link function. The exception is when mac_managed_pm has been
set.
Reading commit fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC
driver manages PHY PM") which introduced mac_managed_pm, this flag
should be set whenever a MAC driver causes phy_start() to be called
via its resume path. That means for a phylink using MAC driver calling
either phylink_start() or phylink_resume() from its resume method.
Setting that flag will have the effect that, in those cases,
mdio_bus_phy_suspend() and mdio_bus_phy_resume() become no-ops.
Now, if the MAC driver does not call either of those two phylink
functions, then mac_managed_pm should not be set, and the mdio bus
PHY suspend/resume should happen in full - because a phylink driver
should be no different from a phylib driver that's supplied an
adjust_link callback.
So yes, I think the principle of your patch is correct, and I agree
that this is needed (just coming to the same conclusion from a
different direction.)
> +static bool phy_uses_state_machine(struct phy_device *phydev)
> +{
> + if (phydev->phy_link_change == phy_link_change)
> + return phydev->attached_dev && phydev->adjust_link;
> +
> + return phydev->phy_link_change;
I think this can be simplified to:
return phydev->phy_link_change != phy_link_change ||
(phydev->attached_dev && phydev->adjust_link);
since phydev->phy_link_change should never be NULL (the hook exists
specifically for phylink's usage, and is either set to phy_link_change
or phylink_phy_change.)
If it were to be NULL, then anything which calls phy_link_(up|down)
will oops the kernel - not just the state machine, but cable testing,
loopback, and changing EEE settings.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists