lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aGT_3SpVVzJFzT6B@pengutronix.de>
Date: Wed, 2 Jul 2025 11:46:05 +0200
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Lukas Wunner <lukas@...ner.de>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	kernel@...gutronix.de, linux-kernel@...r.kernel.org,
	Russell King <linux@...linux.org.uk>, netdev@...r.kernel.org,
	Andre Edich <andre.edich@...rochip.com>
Subject: Re: [PATCH net v1 4/4] net: phy: smsc: Disable IRQ support to
 prevent link state corruption

Hi Lukas,

On Tue, Jul 01, 2025 at 02:58:19PM +0200, Lukas Wunner wrote:
> On Tue, Jul 01, 2025 at 02:21:46PM +0200, Oleksij Rempel wrote:
> > Disable interrupt handling for the LAN87xx PHY to prevent the network
> > interface from entering a corrupted state after rapid configuration
> > changes.
> > 
> > When the link configuration is changed quickly, the PHY can get stuck in
> > a non-functional state. In this state, 'ethtool' reports that a link is
> > present, but 'ip link' shows NO-CARRIER, and the interface is unable to
> > transfer data.
> [...]
> > --- a/drivers/net/phy/smsc.c
> > +++ b/drivers/net/phy/smsc.c
> > @@ -746,10 +746,6 @@ static struct phy_driver smsc_phy_driver[] = {
> >  	.soft_reset	= smsc_phy_reset,
> >  	.config_aneg	= lan87xx_config_aneg,
> >  
> > -	/* IRQ related */
> > -	.config_intr	= smsc_phy_config_intr,
> > -	.handle_interrupt = smsc_phy_handle_interrupt,
> > -
> 
> Well, that's not good.  I guess this means that the interrupt is
> polled again, so we basically go back to the suboptimal behavior
> prior to 1ce8b37241ed?

Not fully. It will disable interrupt support only for the embedded PHY,
other types of interrupts should work as expected.

> Without support for interrupt handling, we can't take advantage
> of the GPIOs on the chip for interrupt generation.  Nor can we
> properly support runtime PM if no cable is attached.

Hm... the PHY smsc driver is not using EDPD mode by default if PHY
interrupts are enabled. Or do you mean other kind of PM?

> What's the actual root cause?  Is it the issue described in this
> paragraph of 1ce8b37241ed's commit message?
> 
>     Normally the PHY interrupt should be masked until the PHY driver has
>     cleared it.  However masking requires a (sleeping) USB transaction and
>     interrupts are received in (non-sleepable) softirq context.  I decided
>     not to mask the interrupt at all (by using the dummy_irq_chip's noop
>     ->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
>     intervals and normally that's sufficient to wake the PHY driver's IRQ
>     thread and have it clear the interrupt.  If it does take longer, worst
>     thing that can happen is the IRQ thread is woken again.  No big deal.

I'm not sure. It seems to be not the problem.

> There must be better options than going back to polling.
> E.g. inserting delays to avoid the PHY getting wedged.
> 
> TBH I did test this thoroughly back in the day and never
> witnessed the issue.

I did some testing back in time too. It worked and still works normally
in the autoneg mode.

What is not working as expected is the fixed mode, especially 10 mbit
fixed mode.

Here are my current testing results:

# configure 10 mbit forced mode:
ethtool -s eth0 autoneg off speed 10 duplex half
# attach cable (can be done wothout reataching cable)
[10174.585150] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[10174.586760] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[10174.594636] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[10174.602777] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[10174.841458] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[10174.843017] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[10174.850619] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[10174.857026] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[10175.425513] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[10175.427046] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[10175.434871] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[10175.441332] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off

At this point no more interrupts will come and link up state will not be
detected. Replugging cable will have same result.

The worst part - unplugging the cable may trigger an endless interrupt storm
(which is some times reproducible in the 10Mbit forced mode):
[ 1584.132799] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1584.134220] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1584.389134] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1584.390591] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1584.644757] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1584.646177] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1584.900781] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1584.902305] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 1585.158416] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098

With latest kernel we can use adaptive polling, wich I added now for
testing. Here are the results:

[ 2200.702427] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[ 2200.948552] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2200.949640] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2200.950182] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2200.951374] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2200.953186] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2200.959234] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[ 2201.204270] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.205284] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.207139] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.208825] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.216406] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[ 2201.460548] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.461618] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.462181] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.463273] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.464764] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.471066] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[ 2201.716547] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.717607] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.718235] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.719267] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.721035] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.727488] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[ 2201.972542] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.973614] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.974176] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2201.975321] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.977078] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2201.983500] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[ 2202.228538] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2202.229615] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2202.230174] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000
[ 2202.231292] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2202.233038] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098
[ 2202.238972] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off
[ 2202.239018] smsc_phy_get_next_update: next update in 250 jiffies
[ 2203.258566] lan87xx_read_status: link: yes, speed: 10, duplex: half, autoneg: off
[ 2203.258756] smsc95xx 1-1.1:1.0 enu1u1: Link is Up - 10Mbps/Half - flow control off

With adaptive polling we can use both. Since IRQ down interrupt works as
expected, we can use low frequency polling (one per 30 seconds) in link up
state. On link down state, after last interrupt poll one time per second
for 30 seconds, then switch to low frequency polling (one per 30
seconds).

I need to figure out haw to handle an interrupt storm.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ