linux-kernel - Re: [PATCH net v1 2/2] net: phy: smsc: add adaptive polling to recover missed link-up on LAN8700

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aGziszwhN1dykeks@pengutronix.de>
Date: Tue, 8 Jul 2025 11:19:47 +0200
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Lukas Wunner <lukas@...ner.de>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	kernel@...gutronix.de, linux-kernel@...r.kernel.org,
	Russell King <linux@...linux.org.uk>, netdev@...r.kernel.org,
	Andre Edich <andre.edich@...rochip.com>
Subject: Re: [PATCH net v1 2/2] net: phy: smsc: add adaptive polling to
 recover missed link-up on LAN8700

On Mon, Jul 07, 2025 at 08:10:36PM +0200, Lukas Wunner wrote:
> > Mitigate this by combining interrupts with adaptive polling:
> > 
> > - When the driver is running in pure polling mode it continues to poll
> >   once per second (unchanged).
> > - With an IRQ present we now
> >   - poll every 30 s while the link is up (low overhead);
> >   - switch to a 1 s poll for up to 30 s after the last IRQ and while the
> >     link is down, ensuring we catch the final silent link-up.
> 
> I think this begs the question, if we *know* that the link may come up
> belatedly without an interrupt, why poll?  Would it work to schedule a
> one-off link check after a reasonable delay to catch the link change?

The exact timing of the link-up depends on the link partner. For
example, when testing with an Intel i350, the longest observed link-up
time was 3.85 seconds. But I cannot measure all possible combinations of
PHYs and link partners, and some might take even longer.

If we use a one-shot delayed check, we would have to wait at least 5
seconds to be safe-maybe more. This would increase the link-up time for
most users, even in cases where the link is already up earlier. So the
user experience would get worse, without guaranteeing we catch all
cases.

That’s why I decided to go with a conservative 1 Hz polling for 30
seconds after the last interrupt. This keeps the link-up time short when
possible, and also helps detect obscure link changes that may otherwise
go unnoticed.

> I'm also wondering if enabling EDPD changes the behavior?

This is not straightforward to test. The PHY driver explicitly requires
polling mode for EDPD support, so when EDPD is enabled, polling is
already active. In that case, the missing IRQ issue is avoided by
design, because the PHY state machine runs regularly regardless of
interrupts. So enabling EDPD indirectly works around the problem, but
not because it changes the PHY behavior-just because it activates
polling.

> > +static unsigned int smsc_phy_get_next_update(struct phy_device *phydev)
> > +{
> > +	struct smsc_phy_priv *priv = phydev->priv;
> > +
> > +	/* If interrupts are disabled, fall back to default polling */
> > +	if (phydev->irq == PHY_POLL)
> > +		return SMSC_NOIRQ_POLLING_INTERVAL;
> > +
> > +	/* The PHY sometimes drops the *final* link-up IRQ when we run
> > +	 * with autoneg OFF (10 Mbps HD/FD) against an autonegotiating
> > +	 * partner: we see several "link down" IRQs, none for "link up".
> 
> Hm, I'm not seeing a check for all those conditions (e.g. autoneg off) here?

You're right, I'm not checking explicitly for autoneg == off or speed ==
10 in the code. I chose not to limit it to just that case, because if
IRQ behavior is unreliable in one configuration, it's possible that
other cases may also be affected in less obvious ways.

Polling once per second for 30 seconds is a small cost in terms of
power, but it greatly reduces the risk of missing a link-up event. In my
opinion, it's a safer default and avoids user-visible issues without
significantly impacting energy usage.

Best Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |