lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251123152208.GE15447@pendragon.ideasonboard.com>
Date: Mon, 24 Nov 2025 00:22:08 +0900
From: Laurent Pinchart <laurent.pinchart@...asonboard.com>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
Cc: netdev@...r.kernel.org, imx@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org,
	Kieran Bingham <kieran.bingham@...asonboard.com>,
	Stefan Klug <stefan.klug@...asonboard.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	Clark Wang <xiaoning.wang@....com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Fabio Estevam <festevam@...x.de>,
	Fabio Estevam <festevam@...il.com>,
	Francesco Dolcini <francesco.dolcini@...adex.com>,
	Frank Li <Frank.Li@....com>, Heiko Schocher <hs@...x.de>,
	Jakub Kicinski <kuba@...nel.org>,
	Joakim Zhang <qiangqing.zhang@....com>, Joy Zou <joy.zou@....com>,
	Marcel Ziswiler <marcel.ziswiler@...adex.com>,
	Marco Felsch <m.felsch@...gutronix.de>,
	Martyn Welch <martyn.welch@...labora.com>,
	Mathieu Othacehe <othacehe@....org>,
	Paolo Abeni <pabeni@...hat.com>,
	Pengutronix Kernel Team <kernel@...gutronix.de>,
	Richard Hu <richard.hu@...hnexion.com>,
	Sascha Hauer <s.hauer@...gutronix.de>,
	Shawn Guo <shawnguo@...nel.org>,
	Shenwei Wang <shenwei.wang@....com>,
	Stefano Radaelli <stefano.radaelli21@...il.com>,
	Wei Fang <wei.fang@....com>,
	Xiaoliang Yang <xiaoliang.yang_1@....com>
Subject: Re: [PATCH] net: stmmac: imx: Do not stop RX_CLK in Rx LPI state for
 i.MX8MP

Hi Russell,

On Sun, Nov 23, 2025 at 08:47:41AM +0000, Russell King (Oracle) wrote:
> On Sun, Nov 23, 2025 at 02:35:18PM +0900, Laurent Pinchart wrote:
> > The i.MX8MP-based Debix Model A board experiences an interrupt storm
> > on the ENET_EQOS IRQ (135) when connected to an EEE-enabled peer.
> > 
> > Setting the eee-broken-1000t DT property in the PHY node solves the
> > problem, which confirms that the issue is related to EEE. Device trees
> > for 8 boards in the mainline kernel, including the i.MX8MP EVK, set the
> > property, which indicates the issue is likely not limited to the Debix
> > board, although some of those device trees may have blindly copied the
> > property from the EVK.
> > 
> > The IRQ is documented in the reference manual as the logical OR of 4
> > signals:
> > 
> > - ENET QOS TSN LPI RX Exit Interrupt
> > - ENET QOS TSN Host System Interrupt
> > - ENET QOS TSN Host System RX Channel Interrupts, Logical OR of
> >   channels[4:0]
> > - ENET QOS TSN Host System TX Channel Interrupts, Logical OR of
> >   channels[4:0]
> > 
> > Debugging the issue showed no unmasked interrupt sources from the Host
> > System Interrupt (GMAC_INT_STATUS), Host System RX Channel Interrupts or
> > Host System TX Channel Interrupts (MTL_INT_STATUS, MTL_CHAN_INT_CTRL and
> > DMA_CHAN_STATUS) that was flagged at an unexpected high rate. This
> > leaves the LPI RX Exit Interrupt as the most likely culprit.
> > 
> > The reference manual doesn't clearly indicate what the interrupt signal
> > is, but from its name we can reasonably infer that it would be connected
> > to the EQOS lpi_intr_o output. That interrupt is cleared when reading
> > the LPI control/status register. However, its deassertion is synchronous
> > to the RX clock domain, so it will take time to clear. It appears that
> > it could even fail to clear at all, as in the following sequence of
> > events:
> > 
> > - When the PHY exits LPI mode, it restarts generating the RX clock
> >   (clk_rx_i input signal to the GMAC).
> > - The MAC detects exit from LPI, and asserts lpi_intr_o. This triggers
> >   the ENET_EQOS interrupt.
> > - Before the CPU has time to process the interrupt, the PHY enters LPI
> >   mode again, and stops generating the RX clock.
> > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> > 
> > The ENET_EQOS interrupt will keep firing until the PHY resumes
> > generating the RX clock when it eventually exits LPI mode again.
> > 
> > As LPI exit is reported by the LPIIS bit in GMAC_INT_STATUS, the
> > lpi_intr_o signal may not have been meant to be wired to a CPU
> > interrupt. It can't be masked in GMAC registers, and OR'ing it to the
> > other GMAC interrupt signals seems to be a design mistake as it makes it
> > impossible to selectively mask the interrupt in the GIC either.
> > 
> > Setting the STMMAC_FLAG_RX_CLK_RUNS_IN_LPI platform data flag gets rid
> > of the interrupt storm, which confirms the above theory.
> > 
> > The i.MX8DXL and i.MX93, which also integrate an EQOS, may also be
> > affected, as hinted by the eee-broken-1000t property being set in the
> > i.MX8DXL EVK and the i.MX93 Variscite SoM device trees. The reference
> > manual of the i.MX93 indicates that the ENET_EQOS interrupt also OR's
> > the "ENET QOS TSN LPI RX exit Interrupt", while the i.MX8DXL reference
> > manual doesn't provide details about the ENET_EQOS interrupt.
> > 
> > Additional testing is needed with the i.MX8DXL and i.MX93, so for now
> > set the flag for the i.MX8MP only. The eee-broken-1000t property could
> > possibly be removed from some of the i.MX8MP device trees, but that also
> > require per-board testing.
> > 
> > Suggested-by: Russell King <linux@...linux.org.uk>
> > Signed-off-by: Laurent Pinchart <laurent.pinchart@...asonboard.com>
> > ---
> > I have CC'ed authors and maintainers of the i.MX8DXL, i.MX8MP and i.MX93
> > device trees that set the eee-broken-1000t property for awareness. To
> > test if the property can be dropped, you will need to
> > 
> > - Connect the EQOS interface to an EEE-enabled peer with a 1000T link.
> > - Drop the eee-broken-1000t property from the device tree.
> > - Boot the board and check with `ethtool --show-eee` that EEE is active.
> > - Check the number of interrupts received from the EQOS in
> >   /proc/interrupts. After boot on my system (with an NFS root) I have
> >   ~6000 interrupts when no interrupt storm occurs, and hundreds of
> >   thousands otherwise.
> > - Apply this patch and check that EEE works as expected without any
> >   interrupt storm. For i.MX8DXL and i.MX93, you will need to set the
> >   STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in the corresponding imx_dwmac_ops
> >   instances in drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c.
> 
> Hang on... also check 100M connections, as I indicated, the lpi_intr_o
> is slow to clear even when the receive clock is running (it takes for
> receive clock cycles - 160ns for 100M, 32ns for 1G.)
> 
> So, I suspect you still get a storm, but it's not as severe.

Of course you're right, I rejoiced too fast :-/

The numbers are getting low enough to not be suspicious, so the
measurements are less precise. I've compared the number of interrupts
right after reaching the login prompt with and without the
eee-broken-100tx and eee-broken-1000t DT properties:

100TX link, eee-broken-* set: 7000 interrupts
1000T link, eee-broken-* set: 2711 interrupts
100TX link, eee-broken-* unset: 9450 interrupts
1000T link, eee-broken-* unset: 6066 interrupts

-- 
Regards,

Laurent Pinchart

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ