[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZPdISTBrrX345RCz@shell.armlinux.org.uk>
Date: Tue, 5 Sep 2023 16:24:57 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Jijie Shao <shaojijie@...wei.com>
Cc: Andrew Lunn <andrew@...n.ch>, f.fainelli@...il.com, davem@...emloft.net,
edumazet@...gle.com, hkallweit1@...il.com, kuba@...nel.org,
netdev@...r.kernel.org, pabeni@...hat.com,
"shenjian15@...wei.com" <shenjian15@...wei.com>,
"liuyonglong@...wei.com" <liuyonglong@...wei.com>,
wangjie125@...wei.com, chenhao418@...wei.com,
Hao Lan <lanhao@...wei.com>,
"wangpeiyang1@...wei.com" <wangpeiyang1@...wei.com>
Subject: Re: [PATCH net-next] net: phy: avoid kernel warning dump when
stopping an errored PHY
On Tue, Sep 05, 2023 at 02:48:13PM +0100, Russell King (Oracle) wrote:
> On Tue, Sep 05, 2023 at 04:49:31PM +0800, Jijie Shao wrote:
> > We note there are several times lock during phy_state_machine(). The first
> > is to handle phydev state. It's noting that a competition of phydev lock
> > happend again if phy_check_link_status() returns an error. Why we don't
> > held lock until changing state to PHY_ERROR if phy_check_link_status()
> > returns an error?
>
> You are quite correct that isn't very good. We can easily get rid of
> some of this mess, but I don't think all which leaves it still open to
> the race you describe.
>
> The problem is phy_suspend().
>
> First, it calls phy_ethtool_get_wol() which takes the phydev lock. This
> can be dealt with if we save the state at probe time, and then update
> the state when phy_ethtool_set_wol() is called.
>
> Second, phy_suspend() calls ->suspend without holding the phydev lock,
> and holding the lock while calling that may not be safe. Having had a
> brief look over the implementations (but not delving into any PTP
> function they may call) does seem to suggest that shouldn't be a big
> problem, but I don't know whether holding the phydev lock while calling
> PTP functions is likely to cause issues.
>
> However, looking at that has lead me to the conclusion that there is a
> lot of duplication of WoL condition testing. phy_suspend() already
> avoids calling ->suspend() if either phy_ethtool_get_wol() indicates
> that WoL is enabled, or if the netdev says WoL is enabled.
>
> Many of the ->suspend() implementations, for example,
> lan88xx_suspend(), dp83822_suspend(), etc test some kind of flag to
> determine whether WoL is enabled and thus seem to be re-implementing
> what phy_suspend() is already doing. I think there is scope to delete
> this code from several drivers.
>
> The easy bits are the patches I've attached to this email. These
> won't on their own be sufficient to close the race you've identified
> due to the phy_suspend() issue, but are the best we can do until we
> can work out what to do about that.
Having looked deeper at this, I think there may be a solution. See
these follow-on patches.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
View attachment "0005-net-phy-move-phy_suspend-to-end-of-phy_state_machine.patch" of type "text/x-diff" (1430 bytes)
View attachment "0006-net-phy-move-phy_state_machine.patch" of type "text/x-diff" (5140 bytes)
View attachment "0007-net-phy-split-locked-and-unlocked-section-of-phy_sta.patch" of type "text/x-diff" (4220 bytes)
View attachment "0008-net-phy-convert-phy_stop-to-use-split-state-machine.patch" of type "text/x-diff" (1499 bytes)
Powered by blists - more mailing lists