[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSLKLYuz0WA2LpFF@shell.armlinux.org.uk>
Date: Sun, 23 Nov 2025 08:47:41 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Laurent Pinchart <laurent.pinchart@...asonboard.com>
Cc: netdev@...r.kernel.org, imx@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org,
Kieran Bingham <kieran.bingham@...asonboard.com>,
Stefan Klug <stefan.klug@...asonboard.com>,
Andrew Lunn <andrew+netdev@...n.ch>,
Clark Wang <xiaoning.wang@....com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Fabio Estevam <festevam@...x.de>,
Fabio Estevam <festevam@...il.com>,
Francesco Dolcini <francesco.dolcini@...adex.com>,
Frank Li <Frank.Li@....com>, Heiko Schocher <hs@...x.de>,
Jakub Kicinski <kuba@...nel.org>,
Joakim Zhang <qiangqing.zhang@....com>, Joy Zou <joy.zou@....com>,
Marcel Ziswiler <marcel.ziswiler@...adex.com>,
Marco Felsch <m.felsch@...gutronix.de>,
Martyn Welch <martyn.welch@...labora.com>,
Mathieu Othacehe <othacehe@....org>,
Paolo Abeni <pabeni@...hat.com>,
Pengutronix Kernel Team <kernel@...gutronix.de>,
Richard Hu <richard.hu@...hnexion.com>,
Sascha Hauer <s.hauer@...gutronix.de>,
Shawn Guo <shawnguo@...nel.org>,
Shenwei Wang <shenwei.wang@....com>,
Stefano Radaelli <stefano.radaelli21@...il.com>,
Wei Fang <wei.fang@....com>,
Xiaoliang Yang <xiaoliang.yang_1@....com>
Subject: Re: [PATCH] net: stmmac: imx: Do not stop RX_CLK in Rx LPI state for
i.MX8MP
On Sun, Nov 23, 2025 at 02:35:18PM +0900, Laurent Pinchart wrote:
> The i.MX8MP-based Debix Model A board experiences an interrupt storm
> on the ENET_EQOS IRQ (135) when connected to an EEE-enabled peer.
>
> Setting the eee-broken-1000t DT property in the PHY node solves the
> problem, which confirms that the issue is related to EEE. Device trees
> for 8 boards in the mainline kernel, including the i.MX8MP EVK, set the
> property, which indicates the issue is likely not limited to the Debix
> board, although some of those device trees may have blindly copied the
> property from the EVK.
>
> The IRQ is documented in the reference manual as the logical OR of 4
> signals:
>
> - ENET QOS TSN LPI RX Exit Interrupt
> - ENET QOS TSN Host System Interrupt
> - ENET QOS TSN Host System RX Channel Interrupts, Logical OR of
> channels[4:0]
> - ENET QOS TSN Host System TX Channel Interrupts, Logical OR of
> channels[4:0]
>
> Debugging the issue showed no unmasked interrupt sources from the Host
> System Interrupt (GMAC_INT_STATUS), Host System RX Channel Interrupts or
> Host System TX Channel Interrupts (MTL_INT_STATUS, MTL_CHAN_INT_CTRL and
> DMA_CHAN_STATUS) that was flagged at an unexpected high rate. This
> leaves the LPI RX Exit Interrupt as the most likely culprit.
>
> The reference manual doesn't clearly indicate what the interrupt signal
> is, but from its name we can reasonably infer that it would be connected
> to the EQOS lpi_intr_o output. That interrupt is cleared when reading
> the LPI control/status register. However, its deassertion is synchronous
> to the RX clock domain, so it will take time to clear. It appears that
> it could even fail to clear at all, as in the following sequence of
> events:
>
> - When the PHY exits LPI mode, it restarts generating the RX clock
> (clk_rx_i input signal to the GMAC).
> - The MAC detects exit from LPI, and asserts lpi_intr_o. This triggers
> the ENET_EQOS interrupt.
> - Before the CPU has time to process the interrupt, the PHY enters LPI
> mode again, and stops generating the RX clock.
> - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> registers. This does not clear lpi_intr_o as there's no clk_rx_i.
>
> The ENET_EQOS interrupt will keep firing until the PHY resumes
> generating the RX clock when it eventually exits LPI mode again.
>
> As LPI exit is reported by the LPIIS bit in GMAC_INT_STATUS, the
> lpi_intr_o signal may not have been meant to be wired to a CPU
> interrupt. It can't be masked in GMAC registers, and OR'ing it to the
> other GMAC interrupt signals seems to be a design mistake as it makes it
> impossible to selectively mask the interrupt in the GIC either.
>
> Setting the STMMAC_FLAG_RX_CLK_RUNS_IN_LPI platform data flag gets rid
> of the interrupt storm, which confirms the above theory.
>
> The i.MX8DXL and i.MX93, which also integrate an EQOS, may also be
> affected, as hinted by the eee-broken-1000t property being set in the
> i.MX8DXL EVK and the i.MX93 Variscite SoM device trees. The reference
> manual of the i.MX93 indicates that the ENET_EQOS interrupt also OR's
> the "ENET QOS TSN LPI RX exit Interrupt", while the i.MX8DXL reference
> manual doesn't provide details about the ENET_EQOS interrupt.
>
> Additional testing is needed with the i.MX8DXL and i.MX93, so for now
> set the flag for the i.MX8MP only. The eee-broken-1000t property could
> possibly be removed from some of the i.MX8MP device trees, but that also
> require per-board testing.
>
> Suggested-by: Russell King <linux@...linux.org.uk>
> Signed-off-by: Laurent Pinchart <laurent.pinchart@...asonboard.com>
> ---
> I have CC'ed authors and maintainers of the i.MX8DXL, i.MX8MP and i.MX93
> device trees that set the eee-broken-1000t property for awareness. To
> test if the property can be dropped, you will need to
>
> - Connect the EQOS interface to an EEE-enabled peer with a 1000T link.
> - Drop the eee-broken-1000t property from the device tree.
> - Boot the board and check with `ethtool --show-eee` that EEE is active.
> - Check the number of interrupts received from the EQOS in
> /proc/interrupts. After boot on my system (with an NFS root) I have
> ~6000 interrupts when no interrupt storm occurs, and hundreds of
> thousands otherwise.
> - Apply this patch and check that EEE works as expected without any
> interrupt storm. For i.MX8DXL and i.MX93, you will need to set the
> STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in the corresponding imx_dwmac_ops
> instances in drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c.
Hang on... also check 100M connections, as I indicated, the lpi_intr_o
is slow to clear even when the receive clock is running (it takes for
receive clock cycles - 160ns for 100M, 32ns for 1G.)
So, I suspect you still get a storm, but it's not as severe.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists