lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <732f3c01-a36f-4c9b-8273-a55aba9094d8@nbd.name>
Date: Wed, 23 Aug 2023 22:18:33 +0200
From: Felix Fietkau <nbd@....name>
To: Vincent Whitchurch <vincent.whitchurch@...s.com>, peppe.cavallaro@...com,
 alexandre.torgue@...com, joabreu@...opsys.com, davem@...emloft.net,
 kuba@...nel.org
Cc: kernel@...s.com, netdev@...r.kernel.org
Subject: Re: [PATCH net] net: stmmac: Use hrtimer for TX coalescing

On 20.11.20 16:02, Vincent Whitchurch wrote:
> This driver uses a normal timer for TX coalescing, which means that the
> with the default tx-usecs of 1000 microseconds the cleanups actually
> happen 10 ms or more later with HZ=100.  This leads to very low
> througput with TCP when bridged to a slow link such as a 4G modem.  Fix
> this by using an hrtimer instead.
> 
> On my ARM platform with HZ=100 and the default TX coalescing settings
> (tx-frames 25 tx-usecs 1000), with "tc qdisc add dev eth0 root netem
> delay 60ms 40ms rate 50Mbit" run on the server, netperf's TCP_STREAM
> improves from ~5.5 Mbps to ~100 Mbps.
> 
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@...s.com>

Based on tests by OpenWrt users, it seems that this one is causing a 
significant performance regression caused by wasting lots of CPU cycles 
re-arming the hrtimer on every single packet. More info:
https://github.com/openwrt/openwrt/issues/11676#issuecomment-1690492666

My suggestion for fixing this properly would be:
- keep a separate timestamp for last tx packet
- do not modify the timer if it's scheduled already
- in the timer function, check the last tx timestamp and re-arm the 
timer if necessary.

This should significantly reduce the number of wasted CPU cycles, even 
when accounting for the additional overhead of hrtimer vs regular timer.

- Felix

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ