[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZPcxnHjDJIMe3xt5@shell.armlinux.org.uk>
Date: Tue, 5 Sep 2023 14:48:12 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Jijie Shao <shaojijie@...wei.com>
Cc: Andrew Lunn <andrew@...n.ch>, f.fainelli@...il.com, davem@...emloft.net,
edumazet@...gle.com, hkallweit1@...il.com, kuba@...nel.org,
netdev@...r.kernel.org, pabeni@...hat.com,
"shenjian15@...wei.com" <shenjian15@...wei.com>,
"liuyonglong@...wei.com" <liuyonglong@...wei.com>,
wangjie125@...wei.com, chenhao418@...wei.com,
Hao Lan <lanhao@...wei.com>,
"wangpeiyang1@...wei.com" <wangpeiyang1@...wei.com>
Subject: Re: [PATCH net-next] net: phy: avoid kernel warning dump when
stopping an errored PHY
On Tue, Sep 05, 2023 at 04:49:31PM +0800, Jijie Shao wrote:
> We note there are several times lock during phy_state_machine(). The first
> is to handle phydev state. It's noting that a competition of phydev lock
> happend again if phy_check_link_status() returns an error. Why we don't
> held lock until changing state to PHY_ERROR if phy_check_link_status()
> returns an error?
You are quite correct that isn't very good. We can easily get rid of
some of this mess, but I don't think all which leaves it still open to
the race you describe.
The problem is phy_suspend().
First, it calls phy_ethtool_get_wol() which takes the phydev lock. This
can be dealt with if we save the state at probe time, and then update
the state when phy_ethtool_set_wol() is called.
Second, phy_suspend() calls ->suspend without holding the phydev lock,
and holding the lock while calling that may not be safe. Having had a
brief look over the implementations (but not delving into any PTP
function they may call) does seem to suggest that shouldn't be a big
problem, but I don't know whether holding the phydev lock while calling
PTP functions is likely to cause issues.
However, looking at that has lead me to the conclusion that there is a
lot of duplication of WoL condition testing. phy_suspend() already
avoids calling ->suspend() if either phy_ethtool_get_wol() indicates
that WoL is enabled, or if the netdev says WoL is enabled.
Many of the ->suspend() implementations, for example,
lan88xx_suspend(), dp83822_suspend(), etc test some kind of flag to
determine whether WoL is enabled and thus seem to be re-implementing
what phy_suspend() is already doing. I think there is scope to delete
this code from several drivers.
The easy bits are the patches I've attached to this email. These
won't on their own be sufficient to close the race you've identified
due to the phy_suspend() issue, but are the best we can do until we
can work out what to do about that.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
View attachment "0001-net-phy-always-call-phy_process_state_change-under-l.patch" of type "text/x-diff" (1297 bytes)
View attachment "0002-net-phy-call-phy_error_precise-while-holding-the-loc.patch" of type "text/x-diff" (1206 bytes)
View attachment "0003-net-phy-move-call-to-start-aneg.patch" of type "text/x-diff" (1044 bytes)
View attachment "0004-net-phy-track-PHY-WoL-enable-state.patch" of type "text/x-diff" (3760 bytes)
Powered by blists - more mailing lists