netdev - Re: [PATCH net] net: stmmac: Use hrtimer for TX coalescing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a583c9fae69a4b2db8ddd70ed2c086c11456871a.camel@axis.com>
Date: Fri, 1 Sep 2023 11:31:59 +0000
From: Vincent Whitchurch <Vincent.Whitchurch@...s.com>
To: "nbd@....name" <nbd@....name>, "joabreu@...opsys.com"
	<joabreu@...opsys.com>, Vincent Whitchurch <Vincent.Whitchurch@...s.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "davem@...emloft.net"
	<davem@...emloft.net>, "alexandre.torgue@...com" <alexandre.torgue@...com>,
	"peppe.cavallaro@...com" <peppe.cavallaro@...com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, kernel
	<kernel@...s.com>
Subject: Re: [PATCH net] net: stmmac: Use hrtimer for TX coalescing

On Wed, 2023-08-30 at 23:06 +0200, Felix Fietkau wrote:
> On 30.08.23 16:55, Vincent Whitchurch wrote:
> > I looked at it some more and the continuous postponing behaviour strikes
> > me as quite odd.  For example, if you set tx-frames coalescing to 0 then
> > cleanups could happen much later than the specified tx-usecs period, in
> > the absence of RX traffic.  Also, if we'd have to have a shared
> > timestamp between the callers of stmmac_tx_timer_arm() and the hrtimer
> > to preserve this continuous postponing behaviour, then we'd need to
> > introduce some locking between the timer expiry and those functions, to
> > avoid race conditions.
> 
> I just spent some time digging through the history of the timer code, 
> figuring out the intention behind the continuous postponing behavior.
> 
> It's an interrupt mitigation scheme where DMA descriptors are configured 
> to only generate a completion event every 25 packets, and the only 
> purpose of the timer is to avoid keeping packets in the queue for too 
> long after tx activity has stopped.
> Based on that design, I believe that the continuous postponing actually 
> makes sense and the patches that eliminate it are misguided. When there 
> is constant activity, there will be tx completion interrupts that 
> trigger cleanup.

The tx-frames value (25) can be controlled from user space.  If it is
set to zero then the driver should still coalesce interrupts based on
the interval specified in the tx-usecs setting, but the driver fails to
do so because it keeps postponing the cleanup and increasing the latency
of the cleanups far beyond the programmed period.

> That said, I did even more digging and I found out that the timer code 
> was added at a time when the driver didn't even disable tx and rx 
> interrupts individually, which means that it could not take advantage of 
> interrupt mitigation via NAPI scheduling + IRQ disable/enable.
> 
> I have a hunch that given the changes made to the driver over time, the 
> timer based interrupt mitigation strategy might just be completely 
> useless and actively harmful now. It certainly messes with things like 
> TSQ and BQL in a nasty way.
> 
> I suspect that the best and easiest way to deal with this issue is to 
> simply rip out all that timer nonsense, rely on tx IRQs entirely and 
> just let NAPI do its thing.

If you want an interrupt for every packet, you can turn off coalescing
by setting tx-frames to 1 and tx-usecs 0.  Currently, the driver does
not turn off the timer even if tx-usecs is set to zero, but that is
trivially fixed.  With such a fix and that setting, the result is a 30x
increase in the number of interrupts in a tx-only test.

And tx-frames 25 tx-usecs 0 is of course also bad for throughput since
cleanups will not happen when fewer than 25 frames are transmitted.