[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5527e953-2225-46e7-9619-5f346c1af9c0@bootlin.com>
Date: Mon, 2 Feb 2026 19:37:42 +0100
From: Maxime Chevallier <maxime.chevallier@...tlin.com>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
Cc: Wei Fang <wei.fang@....com>, andrew@...n.ch, hkallweit1@...il.com,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, florian.fainelli@...adcom.com,
xiaolei.wang@...driver.com, quic_abchauha@...cinc.com,
quic_sarohasa@...cinc.com, imx@...ts.linux.dev, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 net] net: phy: change devlink flag to
AUTOREMOVE_SUPPLIER for non-SFP PHYs
On 02/02/2026 19:00, Russell King (Oracle) wrote:
> On Mon, Feb 02, 2026 at 06:38:48PM +0100, Maxime Chevallier wrote:
>>
>>
>> On 02/02/2026 15:25, Russell King (Oracle) wrote:
>>> On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
>>>> Hi Wei,
>>>>
>>>> On 02/02/2026 06:45, Wei Fang wrote:
>>>>> For the shared MDIO bus use case, multiple MACs will share the same MDIO
>>>>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
>>>>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
>>>>> also be removed. Consequently, the MAC driver should not access the PHY
>>>>> device, otherwise, it will lead to some potential crashes. Because the
>>>>> corresponding phydev and the mii_bus have been freed, some pointers have
>>>>> become invalid.
>>>>>
>>>>> For example. Abhishek reported a crash issue that occurred if the MDIO
>>>>> bus driver was removed first, followed by the MAC driver. The crash log
>>>>> is as below.
>>>>>
>>>>> Call trace:
>>>>> __list_del_entry_valid_or_report+0xa8/0xe0
>>>>> __device_link_del+0x40/0xf0
>>>>> device_link_put_kref+0xb4/0xc8
>>>>> device_link_del+0x38/0x58
>>>>> phy_detach+0x2c/0x170
>>>>> phy_disconnect+0x4c/0x70
>>>>> phylink_disconnect_phy+0x6c/0xc0 [phylink]
>>>>> stmmac_release+0x60/0x358 [stmmac]
>>>>>
>>>>> Another example is the i.MX95-15x15 platform which has two ENETC ports.
>>>>> When all the external PHYs are managed the EMDIO (the MDIO controller),
>>>>> if the enetc driver is removed after the EMDIO driver. Users will see
>>>>> the below crash log and the console is hanged.
>>>>>
>>>>> Call trace:
>>>>> _phy_state_machine+0x230/0x36c (P)
>>>>> phy_stop+0x74/0x190
>>>>> phylink_stop+0x28/0xb8
>>>>> enetc_close+0x28/0x8c
>>>>> __dev_close_many+0xb4/0x1d8
>>>>> netif_close_many+0x8c/0x13c
>>>>> enetc4_pf_remove+0x2c/0x84
>>>>> pci_device_remove+0x44/0xe8
>>>>>
>>>>> To address this issue, Sarosh Hasan tried to change the devlink flag to
>>>>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
>>>>> along with the PHY driver. However, the solution does not take into
>>>>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
>>>>> is unplugged, the MAC driver will automatically be removed, which is not
>>>>> the expected behavior. This issue should not exist for SFP PHYs, so based
>>>>> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
>>>>> for non-SFP PHYs.
>>>>>
>>>>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@...cinc.com>
>>>>> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
>>>>> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
>>>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
>>>>> Suggested-by: Maxime Chevallier <maxime.chevallier@...tlin.com>
>>>>> Signed-off-by: Wei Fang <wei.fang@....com>
>>>>
>>>> I gave that patch a test, with the following cases :
>>>>
>>>> - On Macchiatobin (we have PHYs that share an mdiobus).
>>>> When unbinding a PHY, the MAC dissapears as well :
>>>
>>> Correct, this is why these band-aids are harmful. One "device" can
>>> correspond with *multiple* network interfaces, and the loss of one
>>> PHY can have a *very* detrimental effect.
>>>
>>> Consider the case where root-NFS is being used, and removing a PHY
>>> on another interface takes out the interface that root-NFS is
>>> using. Your machine is now dead in the water.
>>
>> That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
>> and I don't have log regarding "why". I guess there's devlink debug
>> knobs for that, but not enabled by default it seems.
>
> See what I said above. "One "device" can correspond with *multiple*
> network interfaces".
>
> On Armada 8040, one network "device" has multiple ports - they all
> share the same packet infrastructure. Each port is a separate
> interface in the kernel.
>
> Consequently, the "struct device" is common across all ports on one
> of the CP110 dies (there are two dies.) If one triggers an unbind
> of that struct device, then you lose *all* ports on that CP110 die
> whether or not the others _could_ remain functional.
>
> Consider a DSA switch, which has external PHYs connected. Should
> unbinding one port's PHYs take out the entire switch - and in the
> case of multiple switches, cause the entire switch tree to be taken
> out?
Don't get me wrong, I completely agree with you on that, it's pretty bad
to lose all these interfaces in one go, and the debugging experience to
figure this out on an unknown system doesn't sound great.
> This is why devlinks is a bad idea. It's too heavy handed for cases
> beyond the simple "one network device per struct device" model that
> doesn't exist everywhere. For simple cases, yes, maybe, but not
> where it means that taking out one minor part of the system destroys
> the entire system because it chose to unbind a multi-interface device.
>
Fair, fair.
I think you gave enough pointers on a way forward then.
Maxime
Powered by blists - more mailing lists