lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-1tiW9zjcoFkhwc@shell.armlinux.org.uk>
Date: Wed, 2 Apr 2025 18:02:01 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
	Heiner Kallweit <hkallweit1@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Florian Fainelli <florian.fainelli@...adcom.com>,
	Wei Fang <wei.fang@....com>
Subject: Re: [PATCH v2 net 2/2] net: phy: allow MDIO bus PM ops to start/stop
 state machine for phylink-controlled PHY

On Wed, Apr 02, 2025 at 06:08:59PM +0300, Vladimir Oltean wrote:
> DSA has 2 kinds of drivers:
> 
> 1. Those who call dsa_switch_suspend() and dsa_switch_resume() from
>    their device PM ops: qca8k-8xxx, bcm_sf2, microchip ksz
> 2. Those who don't: all others. The above methods should be optional.
> 
> For type 1, dsa_switch_suspend() calls dsa_user_suspend() -> phylink_stop(),
> and dsa_switch_resume() calls dsa_user_resume() -> phylink_start().
> These seem good candidates for setting mac_managed_pm = true because
> that is essentially its definition, but that does not seem to be the
> biggest problem for now, and is not what this change focuses on.
> 
> Talking strictly about the 2nd category of drivers here, I have noticed
> that these also trigger the
> 
> 	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
> 		phydev->state != PHY_UP);
> 
> from mdio_bus_phy_resume(), because the PHY state machine is running.
> 
> It's running as a result of a previous dsa_user_open() -> ... ->
> phylink_start() -> phy_start(), and AFAICS, mdio_bus_phy_suspend() was
> supposed to have called phy_stop_machine(), but it didn't. So this is
> why the PHY is in state PHY_NOLINK by the time mdio_bus_phy_resume()
> runs.
> 
> mdio_bus_phy_suspend() did not call phy_stop_machine() because for
> phylink, the phydev->adjust_link function pointer is NULL. This seems a
> technicality introduced by commit fddd91016d16 ("phylib: fix PAL state
> machine restart on resume"). That commit was written before phylink
> existed, and was intended to avoid crashing with consumer drivers which
> don't use the PHY state machine - phylink does.

I think this is a historical bug (as I believe I previously mentioned,
suspend/resume support wasn't tested with phylink as none of the
platforms it was developed against had suspend/resume support.)

phylink should be no differnet from a MAC that uses phylib and supplies
an adjust_link function. The exception is when mac_managed_pm has been
set.

Reading commit fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC
driver manages PHY PM") which introduced mac_managed_pm, this flag
should be set whenever a MAC driver causes phy_start() to be called
via its resume path. That means for a phylink using MAC driver calling
either phylink_start() or phylink_resume() from its resume method.

Setting that flag will have the effect that, in those cases,
mdio_bus_phy_suspend() and mdio_bus_phy_resume() become no-ops.

Now, if the MAC driver does not call either of those two phylink
functions, then mac_managed_pm should not be set, and the mdio bus
PHY suspend/resume should happen in full - because a phylink driver
should be no different from a phylib driver that's supplied an
adjust_link callback.

So yes, I think the principle of your patch is correct, and I agree
that this is needed (just coming to the same conclusion from a
different direction.)

> +static bool phy_uses_state_machine(struct phy_device *phydev)
> +{
> +	if (phydev->phy_link_change == phy_link_change)
> +		return phydev->attached_dev && phydev->adjust_link;
> +
> +	return phydev->phy_link_change;

I think this can be simplified to:

	return phydev->phy_link_change != phy_link_change ||
	       (phydev->attached_dev && phydev->adjust_link);

since phydev->phy_link_change should never be NULL (the hook exists
specifically for phylink's usage, and is either set to phy_link_change
or phylink_phy_change.)

If it were to be NULL, then anything which calls phy_link_(up|down)
will oops the kernel - not just the state machine, but cable testing,
loopback, and changing EEE settings.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ