netdev - Re: [PATCH net] net: stmmac: Use hrtimer for TX coalescing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8a2d04f5-7cd8-4b49-b538-c85e3c1caec9@nbd.name>
Date: Wed, 30 Aug 2023 23:06:18 +0200
From: Felix Fietkau <nbd@....name>
To: Vincent Whitchurch <Vincent.Whitchurch@...s.com>,
 "alexandre.torgue@...com" <alexandre.torgue@...com>,
 "kuba@...nel.org" <kuba@...nel.org>,
 "davem@...emloft.net" <davem@...emloft.net>,
 "joabreu@...opsys.com" <joabreu@...opsys.com>,
 "peppe.cavallaro@...com" <peppe.cavallaro@...com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
 kernel <kernel@...s.com>
Subject: Re: [PATCH net] net: stmmac: Use hrtimer for TX coalescing

On 30.08.23 16:55, Vincent Whitchurch wrote:
> On Fri, 2023-08-25 at 19:38 +0200, Felix Fietkau wrote:
>> On 25.08.23 15:42, Vincent Whitchurch wrote:
>> > Since the timer expiry function schedules napi, and the napi poll
>> > function stmmac_tx_clean() re-arms the timer if it sees that there are
>> > pending tx packets, shouldn't an implementation similar to hip04_eth.c
>> > (which doesn't save/check the last tx timestamp) be sufficient?
>> 
>> To be honest, I didn't look very closely at what the timer does and how 
>> coalescing works. I don't know if delaying the timer processing with 
>> every tx is the right choice, or if it should be armed only once. 
>> However, as you pointed out, the commit that dropped the re-arming was 
>> reverted because of regressions.
>> 
>> My suggestions are intended to preserve the existing behavior as much as 
>> possible (in order to avoid regressions), while also achieving the 
>> benefit of significantly reducing CPU cycles wasted by re-arming the timer.
> 
> I looked at it some more and the continuous postponing behaviour strikes
> me as quite odd.  For example, if you set tx-frames coalescing to 0 then
> cleanups could happen much later than the specified tx-usecs period, in
> the absence of RX traffic.  Also, if we'd have to have a shared
> timestamp between the callers of stmmac_tx_timer_arm() and the hrtimer
> to preserve this continuous postponing behaviour, then we'd need to
> introduce some locking between the timer expiry and those functions, to
> avoid race conditions.

I just spent some time digging through the history of the timer code, 
figuring out the intention behind the continuous postponing behavior.

It's an interrupt mitigation scheme where DMA descriptors are configured 
to only generate a completion event every 25 packets, and the only 
purpose of the timer is to avoid keeping packets in the queue for too 
long after tx activity has stopped.
Based on that design, I believe that the continuous postponing actually 
makes sense and the patches that eliminate it are misguided. When there 
is constant activity, there will be tx completion interrupts that 
trigger cleanup.

That said, I did even more digging and I found out that the timer code 
was added at a time when the driver didn't even disable tx and rx 
interrupts individually, which means that it could not take advantage of 
interrupt mitigation via NAPI scheduling + IRQ disable/enable.

I have a hunch that given the changes made to the driver over time, the 
timer based interrupt mitigation strategy might just be completely 
useless and actively harmful now. It certainly messes with things like 
TSQ and BQL in a nasty way.

I suspect that the best and easiest way to deal with this issue is to 
simply rip out all that timer nonsense, rely on tx IRQs entirely and 
just let NAPI do its thing.

- Felix