linux-kernel - Re: [PATCH net v2] net: stmmac: fix transmit queue timed out after resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWqmIRFsHkQKkXF-@shell.armlinux.org.uk>
Date: Fri, 16 Jan 2026 20:57:05 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Maxime Chevallier <maxime.chevallier@...tlin.com>
Cc: Tao Wang <tao03.wang@...izon.auto>, alexandre.torgue@...s.st.com,
	andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com,
	horms@...nel.org, kuba@...nel.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, mcoquelin.stm32@...il.com,
	netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH net v2] net: stmmac: fix transmit queue timed out after
 resume

On Fri, Jan 16, 2026 at 07:22:39PM +0000, Russell King (Oracle) wrote:
> Yes, because the receive DMA has stopped, which makes the FIFO between
> the MAC and MTL fill above the threshold for sending pause frames.
> 
> In order to stop the disruption to my network (because it basically
> causes *everything* to clog up) I've had to turn off pause autoneg,
> but that doesn't affect whether or not this happens.
> 
> It _may_ be worth testing whether adding a ndelay(500) into the
> receive processing path, thereby making it intentionally slow,
> allows you to reproduce the problem. If it does, then that confirms
> that we're missing something in the dwmac4 handling for RBU.

I notice that the iMX8MP TRM says similar about the RBU bit
(see 11.7.6.1.482.3 bit 7).

However, it does say that in ring mode, merely advancing the tail
pointer should be sufficient. I can write the tail pointer register
using devmem2, but the hardware never wakes up.

E.g.:

Channel 0 Current Application Receive Descriptor:
Value at address 0x0249114c: 0xfffff910

Channel 0 Rx Descriptor Tail Pointer:
Value at address 0x02491128: 0xfffff910

Value at address 0x02491128: 0xfffff910
Written 0xfffff940; readback 0xfffff940
Value at address 0x02491128: 0xfffff940
Written 0xfffff980; readback 0xfffff980

Value at address 0x0249114c: 0xfffff910

So, the hardware hasn't advanced. Here's the ring state:

			  RDES0     RDES1 RDES2 RDES3
401 [0x0000007ffffff910]: 0xffd63040 0x7f 0x0 0x81000000
402 [0x0000007ffffff920]: 0xffd64040 0x7f 0x0 0x81000000
403 [0x0000007ffffff930]: 0xffd3f040 0x7f 0x0 0x81000000
404 [0x0000007ffffff940]: 0xffeed040 0x7f 0x0 0x81000000
405 [0x0000007ffffff950]: 0xfff2f040 0x7f 0x0 0x81000000
406 [0x0000007ffffff960]: 0xffbee040 0x7f 0x0 0x81000000
407 [0x0000007ffffff970]: 0xffbef040 0x7f 0x0 0x81000000
408 [0x0000007ffffff980]: 0xffbf0040 0x7f 0x0 0x81000000

bit 31 of RDES3 is RDES3_OWN, which when set, means the dwmac core
has ownership of the buffer. Bit 24 means buffer 1 addresa valid
(stored in RDES0). So, if the iMX8MP information is correct, then
advancing 0x02491128 to point at the following descriptors should
"wake" the receive side, but it does not.

Other registers:

Queue 0 Receive Debug:
Value at address 0x02490d38: 0x002a0020

bit 0 = 0 (MTL Rx Queue Write Controller Active Status not detected)
bit 2:1 = 0 (Read controller Idle state)
bits 5:4 = 2 (Rx Queue fill-level above flow-control activate threshold)
bits 29:16 = 0x2a 42 packets in receive queue

Because the internal queue is above the flow-control activate
threshold, that causes the stmmac hardware to constantly spew pause
frames, and, as the stmmac receive side is essentially stuck and won't
make progress even when there are free buffers, the only way to release
this state is via a software reset of the entire core.

Why don't pause frames save us? Well, pause frames will only be sent
when the receive queue fills to the activate threshold, which can only
happen _after_ packets stop being transferred to the descriptor rings.
In other words, it can only happen when a RBU event has been detected,
which suspends the receiver - and it seems when that happens, it is
irrecoverable without soft-reset on Xavier.

Right now, I'm not sure what to think about this - I don't know whether
it's the hardware that's at fault, or whether there's an issue in the
driver. What I know for certain is what I've stated above, and the
fact that iperf3 -R has *extremely* detrimental effects on my *entire*
network.

The reason is... you connect two Netgear switches together, they use
flow control, and you have no way to turn that off... So, once stmmac
starts sending pause frames, the switches queue for that port fills,
and when further frames come in for that port, the switch sends pause
frames to the next switch behind which stops all traffic flow between
the two switches, severing the network. All the time that stmmac keeps
that up, so does the switch it is connected to.

If another machine happens to send a packet that needs to be queued on
the port that stmmac is connected to (e.g. broadcast or multicast)
then... that port starts sending pause frames back to that machine,
severing its network connection permanently while stmmac is spewing
pause frames.

Thus, the entire network goes down, on account of _one_ machine
repeatedly sending pause frames, preventing packet delivery.

While the idea of a lossless network _seems_ like a good idea, in
reality it gives an attacker who can get on a platform and take
control of the ethernet NIC the ability to completely screw an entire
network if flow control is enabled everywhere. I'm thinking at this
point... just say no to flow control, disable it everywhere one can.
Ethernet was designed to lose packets when it needs to, to ensure
fairness. Flow control destroys that fairness and results in networks
being severed.

"attacker" is maybe too strong - consider what happens if the kernel
crashes on a stmmac platform, so it can't receive packets anymore,
and the ring fills up, causing it to start spewing pause frames.
It's goodbye network!

I'm just rambling, but I think that point is justified.

Thoughts - should the kernel default to having flow control enabled
or disabled in light of this? Should this feature require explicit
administrative configuration given the severity of network disruption?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!