netdev - [PATCH RFC 0/5] net: stmmac: fix resume failures due to RX clock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z8B4tVd4nLUKXdQ4@shell.armlinux.org.uk>
Date: Thu, 27 Feb 2025 14:37:41 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>
Cc: Alexandre Torgue <alexandre.torgue@...s.st.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Jon Hunter <jonathanh@...dia.com>,
	linux-arm-kernel@...ts.infradead.org,
	linux-stm32@...md-mailman.stormreply.com,
	Maxime Coquelin <mcoquelin.stm32@...il.com>, netdev@...r.kernel.org,
	Paolo Abeni <pabeni@...hat.com>,
	Thierry Reding <treding@...dia.com>
Subject: [PATCH RFC 0/5] net: stmmac: fix resume failures due to RX clock

Hi,

This series is likely dependent on the "net: stmmac: cleanup transmit
clock setting" series which was submitted earlier today.

stmmac has a long history of failing to resume due to lack of receive
clock. NVidia have reported that as a result of the EEE changes, they
see a greater chance of resume failure with those patches applied than
before.

The issue is that the DesignWare core requires that the receive clock
is running in order to complete software reset, which causes
stmmac_reset() and stmmac_hw_setup() to fail.

There are several things that are wrong:

1. Calling phylink_start() early can result in a call to mac_link_up()
   which will set TE and RE bits before stmmac_hw_setup() has been
   called. This is evident in the debug logs that NVidia sent while
   debugging the problem.

   This is something I have pointed out in the past, but ithas been
   claimed to be necessary to do things this way to have the PHY
   receive clock running. Enabling RE before DMA is setup is against
   the DesignWare databook documentation.

2. Enabling LPI clock-stop at the PHY (as the driver has done prior
   to my patch set) allows the PHY to stop its receive clock when the
   link enters low-power mode. This means the completion of reset is
   dependent on the current EEE state, which is indeterminable, but
   is likely to be in low power mode on resume.

We solve (1) by moving the call to phylink_resume() later. This patch
on its own probably causes regressions as it may make it more likely
that the link will be in low power state, or maybe the PHY driver does
not respect the PHY_F_RXC_ALWAYS_ON flag - this needs to be tested on
as many different hardware setups that use this driver as possible,
and any issues addressed *without* moving phylink_resume() back.
If we need some way to resume the PHY early, then we need to work out
some way to do that with phylib without calling phy_start() early.

(2) is fixed by introducing phylink_prepare_resume(), which will
disable receive clock-stop in LPI mode at the PHY, and we will restore
the clock-stop setting in phylink_resume(). It is possible that this
solves some of the reason for the early placement of phylink_resume().

phylink_prepare_resume() also provides a convenient site should (1)
need further work.

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 27 +++++++--------
 drivers/net/phy/phylink.c                         | 40 ++++++++++++++++++++++-
 include/linux/phylink.h                           |  1 +
 3 files changed, 51 insertions(+), 17 deletions(-)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!