[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250902144241.avfiqpmqy7xhlwqa@skbuf>
Date: Tue, 2 Sep 2025 17:42:41 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH net] net: phy: transfer phy_config_inband() locking
responsibility to phylink
On Tue, Sep 02, 2025 at 03:09:42PM +0100, Russell King (Oracle) wrote:
> On Tue, Sep 02, 2025 at 04:41:41PM +0300, Vladimir Oltean wrote:
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index c7f867b361dd..350905928d46 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -1580,10 +1585,13 @@ static void phylink_resolve(struct work_struct *w)
> > {
> > struct phylink *pl = container_of(w, struct phylink, resolve);
> > struct phylink_link_state link_state;
> > + struct phy_device *phy = pl->phydev;
> > bool mac_config = false;
> > bool retrigger = false;
> > bool cur_link_state;
> >
> > + if (phy)
> > + mutex_lock(&phy->lock);
>
> I don't think this is safe.
>
> The addition and removal of PHYs is protected by two locks:
>
> 1. RTNL, to prevent ethtool operations running concurrently with the
> addition or removal of PHYs.
>
> 2. The state_mutex which protects the resolver which doesn't take the
> RTNL.
>
> Given that the RTNL is not held in this path, dereferencing pl->phydev
> is unsafe as the PHY may go away (through e.g. SFP module removal)
> which means this mutex_lock() may end up operating on free'd memory.
>
> I'm not sure we want to be taking the RTNL on this path.
>
> At the moment, I'm not sure what the solution is here.
Rephrased and slightly expanded: phylink_disconnect_phy(), when called
from drivers, has the convention that phylink_stop() must have been
called prior, or phylink_start() must have never been called.
However, when called from phylink_sfp_disconnect_phy(),
phylink_disconnect_phy() does not benefit from the same guarantee that
phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED) ran.
Correct so far?
Can we disable the resolver from phylink_sfp_disconnect_phy(), to offer
a similar guarantee that phylink_disconnect_phy() never runs with a
concurrent resolver?
I don't have a local setup at the moment to test what happens when I
unplug an SFP module with the change I am proposing. I can test in a few
hours at the earliest. However, there's a chance testing won't reveal
why we don't stop the resolver during SFP module disconnection, hence
the reason for this possibly stupid question.
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 350905928d46..a8facc177f1f 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2313,17 +2313,13 @@ void phylink_disconnect_phy(struct phylink *pl)
ASSERT_RTNL();
+ WARN_ON(!test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state));
+
phy = pl->phydev;
if (phy) {
- mutex_lock(&phy->lock);
- mutex_lock(&pl->state_mutex);
pl->phydev = NULL;
pl->phy_enable_tx_lpi = false;
pl->mac_tx_clk_stop = false;
- mutex_unlock(&pl->state_mutex);
- mutex_unlock(&phy->lock);
- flush_work(&pl->resolve);
-
phy_disconnect(phy);
}
}
@@ -3809,7 +3805,10 @@ static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
static void phylink_sfp_disconnect_phy(void *upstream,
struct phy_device *phydev)
{
- phylink_disconnect_phy(upstream);
+ struct phylink *pl = upstream;
+
+ phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED);
+ phylink_disconnect_phy(pl);
}
static const struct sfp_upstream_ops sfp_phylink_ops = {
Powered by blists - more mailing lists