lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1a44dde-376c-4140-8f51-aeac0a49c0da@redhat.com>
Date: Tue, 18 Nov 2025 22:14:56 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>, "David S . Miller"
 <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>
Cc: Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>,
 Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org,
 eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/2] tcp: add net.ipv4.tcp_rtt_threshold sysctl

Hi,

On 11/17/25 2:28 PM, Eric Dumazet wrote:
> This is a follow up of commit aa251c84636c ("tcp: fix too slow
> tcp_rcvbuf_grow() action") which brought again the issue that I tried
> to fix in commit 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot")
> 
> We also recently increased tcp_rmem[2] to 32 MB in commit 572be9bf9d0d
> ("tcp: increase tcp_rmem[2] to 32 MB")
> 
> Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf
> too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can
> force NIC driver to not recycle pages from the page pool, and also
> can cause cache evictions for DDIO enabled cpus/NIC, as receivers
> are usually slower than senders.
> 
> Add net.ipv4.tcp_rtt_threshold sysctl, set by default to 1000 usec (1 ms)
> If RTT if smaller than the sysctl value, use the RTT/tcp_rtt_threshold
> ratio to control sk_rcvbuf inflation.
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>

I gave this series a spin in my test-bed: 2 quite old hosts b2b
connected via 100Gbps links. RTT is < 100us. Doing bulk/iperf3 tcp
transfers, with irq and user-space processes pinned.

The average tput for 30s connections does not change measurably: ~23Gbps
per connection. WRT the receiver buffer, in 30 runs prior to this patch
I see:

min 1901769, max 4322922 avg 2900036

On top of this series:

min 1078047 max 3967327 avg 2465665.

So I do see smaller buffers on average, but I'm not sure I'm hitting the
 reference scenario (notably the lowest value here is considerably
higher than the theoretical minimum rcvwin required to handle the given
B/W).

Should I go for longer (or shorter) connections?

Thanks,

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ