netdev - Re: [PATCH net v2] net: stmmac: fix transmit queue timed out after resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWqP_hhX73x_8Qs1@shell.armlinux.org.uk>
Date: Fri, 16 Jan 2026 19:22:38 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Maxime Chevallier <maxime.chevallier@...tlin.com>
Cc: Tao Wang <tao03.wang@...izon.auto>, alexandre.torgue@...s.st.com,
	andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com,
	horms@...nel.org, kuba@...nel.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, mcoquelin.stm32@...il.com,
	netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH net v2] net: stmmac: fix transmit queue timed out after
 resume

On Fri, Jan 16, 2026 at 07:27:16PM +0100, Maxime Chevallier wrote:
> Hi,
> 
> On 16/01/2026 19:08, Russell King (Oracle) wrote:
> > On Fri, Jan 16, 2026 at 01:37:48PM +0000, Russell King (Oracle) wrote:
> >> On Fri, Jan 16, 2026 at 12:50:35AM +0000, Russell King (Oracle) wrote:
> >>> However, while this may explain the transmit slowdown because it's
> >>> on the transmit side, it doesn't explain the receive problem.
> >>
> >> I'm bisecting to find the cause of the receive issue, but it's going to
> >> take a long time (in the mean time, I can't do any mainline work.)
> >>
> >> So far, the range of good/bad has been narrowed down to 6.14 is good,
> >> 1b98f357dadd ("Merge tag 'net-next-6.16' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") is bad.
> >>
> >> 14 more iterations to go. Might be complete by Sunday. (Slowness in
> >> building the more fully featured net-next I use primarily for build
> >> testing, the slowness of the platform to reboot, and the need to
> >> manually test each build.)
> > 
> > Well, that's been a waste of time today. While the next iteration was
> > building, because it's been suspicious that each and every bisect
> > point has failed so far, I decided to re-check 6.14, and that fails.
> > So, it looks like this problem has existed for some considerable
> > time. I don't have the compute power locally to bisect over a massive
> > range of kernels, so I'm afraid stmmac receive is going to have to
> > stay broken unless someone else can bisect (and find a "good" point
> > in the git history.)
> > 
> 
> To me RX looks OK, at least on the various devices I have that use
> stmmac. It's fine on Cyclone V socfpga, and imx8mp. Maybe that's Jetson
> specific ?

Maybe - it could be something to do with MMUs slowing down the packet
rate, or it could be uncovering a bug in stmmac's handling of dwmac4
when it runs out of descriptors in the ring.

The problem I'm seeing is that RBU ends up being set in the channel 0
control register (there's only a single channel) which means that the
hardware moved on to the next receive descriptor, and found that it
didn't own it.

It _should_ be counted by this statistic:

     rx_buf_unav_irq: 0

but clearly, this doesn't work, because here is the channel 0 status
register:

Value at address 0x02491160: 0x00000484

which has:

#define DMA_CHAN_STATUS_RBU             BIT(7)

set. The documentation I have (sadly not for Xavier but for stm32mp151)
states that when this occurs, a "Receive Poll Demand" command needs to
be issued, but fails to explain how to do that. Older cores (such as
dwmac1000) had a "received poll demand" register to write to for this.

> I've got pretty-much line rate with a basic 'iperf3 -c XX" and same with
> 'iperf3 -c XX -R". What commands are you running to check the issue ?

Merely iperf3 -R -c XX, it's enough to make it fall over normally
within the first second.

> Are you still seeing the pause frames flood ?

Yes, because the receive DMA has stopped, which makes the FIFO between
the MAC and MTL fill above the threshold for sending pause frames.

In order to stop the disruption to my network (because it basically
causes *everything* to clog up) I've had to turn off pause autoneg,
but that doesn't affect whether or not this happens.

It _may_ be worth testing whether adding a ndelay(500) into the
receive processing path, thereby making it intentionally slow,
allows you to reproduce the problem. If it does, then that confirms
that we're missing something in the dwmac4 handling for RBU.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!