netdev - Re: [PATCH net v1 4/4] net: phy: smsc: Disable IRQ support to prevent link state corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGPba6fX1bqgVfYC@wunner.de>
Date: Tue, 1 Jul 2025 14:58:19 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	kernel@...gutronix.de, linux-kernel@...r.kernel.org,
	Russell King <linux@...linux.org.uk>, netdev@...r.kernel.org,
	Andre Edich <andre.edich@...rochip.com>
Subject: Re: [PATCH net v1 4/4] net: phy: smsc: Disable IRQ support to
 prevent link state corruption

On Tue, Jul 01, 2025 at 02:21:46PM +0200, Oleksij Rempel wrote:
> Disable interrupt handling for the LAN87xx PHY to prevent the network
> interface from entering a corrupted state after rapid configuration
> changes.
> 
> When the link configuration is changed quickly, the PHY can get stuck in
> a non-functional state. In this state, 'ethtool' reports that a link is
> present, but 'ip link' shows NO-CARRIER, and the interface is unable to
> transfer data.
[...]
> --- a/drivers/net/phy/smsc.c
> +++ b/drivers/net/phy/smsc.c
> @@ -746,10 +746,6 @@ static struct phy_driver smsc_phy_driver[] = {
>  	.soft_reset	= smsc_phy_reset,
>  	.config_aneg	= lan87xx_config_aneg,
>  
> -	/* IRQ related */
> -	.config_intr	= smsc_phy_config_intr,
> -	.handle_interrupt = smsc_phy_handle_interrupt,
> -

Well, that's not good.  I guess this means that the interrupt is
polled again, so we basically go back to the suboptimal behavior
prior to 1ce8b37241ed?

Without support for interrupt handling, we can't take advantage
of the GPIOs on the chip for interrupt generation.  Nor can we
properly support runtime PM if no cable is attached.

What's the actual root cause?  Is it the issue described in this
paragraph of 1ce8b37241ed's commit message?

    Normally the PHY interrupt should be masked until the PHY driver has
    cleared it.  However masking requires a (sleeping) USB transaction and
    interrupts are received in (non-sleepable) softirq context.  I decided
    not to mask the interrupt at all (by using the dummy_irq_chip's noop
    ->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
    intervals and normally that's sufficient to wake the PHY driver's IRQ
    thread and have it clear the interrupt.  If it does take longer, worst
    thing that can happen is the IRQ thread is woken again.  No big deal.

There must be better options than going back to polling.
E.g. inserting delays to avoid the PHY getting wedged.

TBH I did test this thoroughly back in the day and never
witnessed the issue.

Thanks,

Lukas