lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <PAXPR04MB8510CB77C95D4D044FE62C64889BA@PAXPR04MB8510.eurprd04.prod.outlook.com>
Date: Tue, 3 Feb 2026 05:14:08 +0000
From: Wei Fang <wei.fang@....com>
To: Maxime Chevallier <maxime.chevallier@...tlin.com>, "Russell King (Oracle)"
	<linux@...linux.org.uk>
CC: "andrew@...n.ch" <andrew@...n.ch>, "hkallweit1@...il.com"
	<hkallweit1@...il.com>, "davem@...emloft.net" <davem@...emloft.net>,
	"edumazet@...gle.com" <edumazet@...gle.com>, "kuba@...nel.org"
	<kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
	"florian.fainelli@...adcom.com" <florian.fainelli@...adcom.com>, xiaolei.wang
	<xiaolei.wang@...driver.com>, "quic_abchauha@...cinc.com"
	<quic_abchauha@...cinc.com>, "quic_sarohasa@...cinc.com"
	<quic_sarohasa@...cinc.com>, "imx@...ts.linux.dev" <imx@...ts.linux.dev>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 net] net: phy: change devlink flag to
 AUTOREMOVE_SUPPLIER for non-SFP PHYs

> >>>>> For the shared MDIO bus use case, multiple MACs will share the same
> MDIO
> >>>>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
> >>>>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
> >>>>> also be removed. Consequently, the MAC driver should not access the
> PHY
> >>>>> device, otherwise, it will lead to some potential crashes. Because the
> >>>>> corresponding phydev and the mii_bus have been freed, some pointers
> have
> >>>>> become invalid.
> >>>>>
> >>>>> For example. Abhishek reported a crash issue that occurred if the MDIO
> >>>>> bus driver was removed first, followed by the MAC driver. The crash log
> >>>>> is as below.
> >>>>>
> >>>>> Call trace:
> >>>>>  __list_del_entry_valid_or_report+0xa8/0xe0
> >>>>>  __device_link_del+0x40/0xf0
> >>>>>  device_link_put_kref+0xb4/0xc8
> >>>>>  device_link_del+0x38/0x58
> >>>>>  phy_detach+0x2c/0x170
> >>>>>  phy_disconnect+0x4c/0x70
> >>>>>  phylink_disconnect_phy+0x6c/0xc0 [phylink]
> >>>>>  stmmac_release+0x60/0x358 [stmmac]
> >>>>>
> >>>>> Another example is the i.MX95-15x15 platform which has two ENETC
> ports.
> >>>>> When all the external PHYs are managed the EMDIO (the MDIO
> controller),
> >>>>> if the enetc driver is removed after the EMDIO driver. Users will see
> >>>>> the below crash log and the console is hanged.
> >>>>>
> >>>>> Call trace:
> >>>>>  _phy_state_machine+0x230/0x36c (P)
> >>>>>  phy_stop+0x74/0x190
> >>>>>  phylink_stop+0x28/0xb8
> >>>>>  enetc_close+0x28/0x8c
> >>>>>  __dev_close_many+0xb4/0x1d8
> >>>>>  netif_close_many+0x8c/0x13c
> >>>>>  enetc4_pf_remove+0x2c/0x84
> >>>>>  pci_device_remove+0x44/0xe8
> >>>>>
> >>>>> To address this issue, Sarosh Hasan tried to change the devlink flag to
> >>>>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be
> removed
> >>>>> along with the PHY driver. However, the solution does not take into
> >>>>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY
> device
> >>>>> is unplugged, the MAC driver will automatically be removed, which is not
> >>>>> the expected behavior. This issue should not exist for SFP PHYs, so based
> >>>>> on the Sarosh's patch, the flag is changed to
> DL_FLAG_AUTOREMOVE_SUPPLIER
> >>>>> for non-SFP PHYs.
> >>>>>
> >>>>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@...cinc.com>
> >>>>> Closes:
> https://lore.kern/
> el.org%2Fall%2Fd696a426-40bb-4c1a-b42d-990fb690de5e%40quicinc.com%2F
> &data=05%7C02%7Cwei.fang%40nxp.com%7C676fc8b469714ff8085608de628a
> 2c93%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6390565427624
> 56857%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLj
> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%
> 7C%7C&sdata=K1MW48la2uCeS%2BfO1H9joRudm9VwdBYL5pb1DqmFio4%3D
> &reserved=0
> >>>>> Link:
> https://lore.kern/
> el.org%2Fimx%2F20250703090041.23137-1-quic_sarohasa%40quicinc.com%2F
> &data=05%7C02%7Cwei.fang%40nxp.com%7C676fc8b469714ff8085608de628a
> 2c93%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6390565427624
> 82009%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLj
> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%
> 7C%7C&sdata=692k06Lm44azC%2BgosFwRwn6QPXHC56iuodxYhg%2FZNgM%3
> D&reserved=0 # [1]
> >>>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> >>>>> Suggested-by: Maxime Chevallier <maxime.chevallier@...tlin.com>
> >>>>> Signed-off-by: Wei Fang <wei.fang@....com>
> >>>>
> >>>> I gave that patch a test, with the following cases :
> >>>>
> >>>>  - On Macchiatobin (we have PHYs that share an mdiobus).
> >>>> When unbinding a PHY, the MAC dissapears as well :
> >>>
> >>> Correct, this is why these band-aids are harmful. One "device" can
> >>> correspond with *multiple* network interfaces, and the loss of one
> >>> PHY can have a *very* detrimental effect.
> >>>
> >>> Consider the case where root-NFS is being used, and removing a PHY
> >>> on another interface takes out the interface that root-NFS is
> >>> using. Your machine is now dead in the water.
> >>
> >> That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
> >> and I don't have log regarding "why". I guess there's devlink debug
> >> knobs for that, but not enabled by default it seems.
> >
> > See what I said above. "One "device" can correspond with *multiple*
> > network interfaces".
> >
> > On Armada 8040, one network "device" has multiple ports - they all
> > share the same packet infrastructure. Each port is a separate
> > interface in the kernel.
> >
> > Consequently, the "struct device" is common across all ports on one
> > of the CP110 dies (there are two dies.) If one triggers an unbind
> > of that struct device, then you lose *all* ports on that CP110 die
> > whether or not the others _could_ remain functional.
> >
> > Consider a DSA switch, which has external PHYs connected. Should
> > unbinding one port's PHYs take out the entire switch - and in the
> > case of multiple switches, cause the entire switch tree to be taken
> > out?
>
> Don't get me wrong, I completely agree with you on that, it's pretty bad
> to lose all these interfaces in one go, and the debugging experience to
> figure this out on an unknown system doesn't sound great.
>
> > This is why devlinks is a bad idea. It's too heavy handed for cases
> > beyond the simple "one network device per struct device" model that
> > doesn't exist everywhere. For simple cases, yes, maybe, but not
> > where it means that taking out one minor part of the system destroys
> > the entire system because it chose to unbind a multi-interface device.
> >
> Fair, fair.
>
> I think you gave enough pointers on a way forward then.
>
Thanks all for your comments, it seems that this issue cannot be resolved
in a short time. I will think carefully about how to better solve this issue,
and if anyone proposes a new patch before then, I would be happy to help
verify it.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ