lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSLKLYuz0WA2LpFF@shell.armlinux.org.uk>
Date: Sun, 23 Nov 2025 08:47:41 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Laurent Pinchart <laurent.pinchart@...asonboard.com>
Cc: netdev@...r.kernel.org, imx@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org,
	Kieran Bingham <kieran.bingham@...asonboard.com>,
	Stefan Klug <stefan.klug@...asonboard.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	Clark Wang <xiaoning.wang@....com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Fabio Estevam <festevam@...x.de>,
	Fabio Estevam <festevam@...il.com>,
	Francesco Dolcini <francesco.dolcini@...adex.com>,
	Frank Li <Frank.Li@....com>, Heiko Schocher <hs@...x.de>,
	Jakub Kicinski <kuba@...nel.org>,
	Joakim Zhang <qiangqing.zhang@....com>, Joy Zou <joy.zou@....com>,
	Marcel Ziswiler <marcel.ziswiler@...adex.com>,
	Marco Felsch <m.felsch@...gutronix.de>,
	Martyn Welch <martyn.welch@...labora.com>,
	Mathieu Othacehe <othacehe@....org>,
	Paolo Abeni <pabeni@...hat.com>,
	Pengutronix Kernel Team <kernel@...gutronix.de>,
	Richard Hu <richard.hu@...hnexion.com>,
	Sascha Hauer <s.hauer@...gutronix.de>,
	Shawn Guo <shawnguo@...nel.org>,
	Shenwei Wang <shenwei.wang@....com>,
	Stefano Radaelli <stefano.radaelli21@...il.com>,
	Wei Fang <wei.fang@....com>,
	Xiaoliang Yang <xiaoliang.yang_1@....com>
Subject: Re: [PATCH] net: stmmac: imx: Do not stop RX_CLK in Rx LPI state for
 i.MX8MP

On Sun, Nov 23, 2025 at 02:35:18PM +0900, Laurent Pinchart wrote:
> The i.MX8MP-based Debix Model A board experiences an interrupt storm
> on the ENET_EQOS IRQ (135) when connected to an EEE-enabled peer.
> 
> Setting the eee-broken-1000t DT property in the PHY node solves the
> problem, which confirms that the issue is related to EEE. Device trees
> for 8 boards in the mainline kernel, including the i.MX8MP EVK, set the
> property, which indicates the issue is likely not limited to the Debix
> board, although some of those device trees may have blindly copied the
> property from the EVK.
> 
> The IRQ is documented in the reference manual as the logical OR of 4
> signals:
> 
> - ENET QOS TSN LPI RX Exit Interrupt
> - ENET QOS TSN Host System Interrupt
> - ENET QOS TSN Host System RX Channel Interrupts, Logical OR of
>   channels[4:0]
> - ENET QOS TSN Host System TX Channel Interrupts, Logical OR of
>   channels[4:0]
> 
> Debugging the issue showed no unmasked interrupt sources from the Host
> System Interrupt (GMAC_INT_STATUS), Host System RX Channel Interrupts or
> Host System TX Channel Interrupts (MTL_INT_STATUS, MTL_CHAN_INT_CTRL and
> DMA_CHAN_STATUS) that was flagged at an unexpected high rate. This
> leaves the LPI RX Exit Interrupt as the most likely culprit.
> 
> The reference manual doesn't clearly indicate what the interrupt signal
> is, but from its name we can reasonably infer that it would be connected
> to the EQOS lpi_intr_o output. That interrupt is cleared when reading
> the LPI control/status register. However, its deassertion is synchronous
> to the RX clock domain, so it will take time to clear. It appears that
> it could even fail to clear at all, as in the following sequence of
> events:
> 
> - When the PHY exits LPI mode, it restarts generating the RX clock
>   (clk_rx_i input signal to the GMAC).
> - The MAC detects exit from LPI, and asserts lpi_intr_o. This triggers
>   the ENET_EQOS interrupt.
> - Before the CPU has time to process the interrupt, the PHY enters LPI
>   mode again, and stops generating the RX clock.
> - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
>   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> 
> The ENET_EQOS interrupt will keep firing until the PHY resumes
> generating the RX clock when it eventually exits LPI mode again.
> 
> As LPI exit is reported by the LPIIS bit in GMAC_INT_STATUS, the
> lpi_intr_o signal may not have been meant to be wired to a CPU
> interrupt. It can't be masked in GMAC registers, and OR'ing it to the
> other GMAC interrupt signals seems to be a design mistake as it makes it
> impossible to selectively mask the interrupt in the GIC either.
> 
> Setting the STMMAC_FLAG_RX_CLK_RUNS_IN_LPI platform data flag gets rid
> of the interrupt storm, which confirms the above theory.
> 
> The i.MX8DXL and i.MX93, which also integrate an EQOS, may also be
> affected, as hinted by the eee-broken-1000t property being set in the
> i.MX8DXL EVK and the i.MX93 Variscite SoM device trees. The reference
> manual of the i.MX93 indicates that the ENET_EQOS interrupt also OR's
> the "ENET QOS TSN LPI RX exit Interrupt", while the i.MX8DXL reference
> manual doesn't provide details about the ENET_EQOS interrupt.
> 
> Additional testing is needed with the i.MX8DXL and i.MX93, so for now
> set the flag for the i.MX8MP only. The eee-broken-1000t property could
> possibly be removed from some of the i.MX8MP device trees, but that also
> require per-board testing.
> 
> Suggested-by: Russell King <linux@...linux.org.uk>
> Signed-off-by: Laurent Pinchart <laurent.pinchart@...asonboard.com>
> ---
> I have CC'ed authors and maintainers of the i.MX8DXL, i.MX8MP and i.MX93
> device trees that set the eee-broken-1000t property for awareness. To
> test if the property can be dropped, you will need to
> 
> - Connect the EQOS interface to an EEE-enabled peer with a 1000T link.
> - Drop the eee-broken-1000t property from the device tree.
> - Boot the board and check with `ethtool --show-eee` that EEE is active.
> - Check the number of interrupts received from the EQOS in
>   /proc/interrupts. After boot on my system (with an NFS root) I have
>   ~6000 interrupts when no interrupt storm occurs, and hundreds of
>   thousands otherwise.
> - Apply this patch and check that EEE works as expected without any
>   interrupt storm. For i.MX8DXL and i.MX93, you will need to set the
>   STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in the corresponding imx_dwmac_ops
>   instances in drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c.

Hang on... also check 100M connections, as I indicated, the lpi_intr_o
is slow to clear even when the receive clock is running (it takes for
receive clock cycles - 160ns for 100M, 32ns for 1G.)

So, I suspect you still get a storm, but it's not as severe.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ