netdev - Re: [PATCH net-next 2/2] tcp: add net.ipv4.tcp_rtt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLGXY0qhvNNZWVppq+u0kccD5QCVAEqZ_0GyZGGeWL=Yg@mail.gmail.com>
Date: Tue, 18 Nov 2025 13:22:34 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/2] tcp: add net.ipv4.tcp_rtt_threshold sysctl

On Tue, Nov 18, 2025 at 1:15 PM Paolo Abeni <pabeni@...hat.com> wrote:
>
> Hi,
>
> On 11/17/25 2:28 PM, Eric Dumazet wrote:
> > This is a follow up of commit aa251c84636c ("tcp: fix too slow
> > tcp_rcvbuf_grow() action") which brought again the issue that I tried
> > to fix in commit 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot")
> >
> > We also recently increased tcp_rmem[2] to 32 MB in commit 572be9bf9d0d
> > ("tcp: increase tcp_rmem[2] to 32 MB")
> >
> > Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf
> > too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can
> > force NIC driver to not recycle pages from the page pool, and also
> > can cause cache evictions for DDIO enabled cpus/NIC, as receivers
> > are usually slower than senders.
> >
> > Add net.ipv4.tcp_rtt_threshold sysctl, set by default to 1000 usec (1 ms)
> > If RTT if smaller than the sysctl value, use the RTT/tcp_rtt_threshold
> > ratio to control sk_rcvbuf inflation.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>
> I gave this series a spin in my test-bed: 2 quite old hosts b2b
> connected via 100Gbps links. RTT is < 100us. Doing bulk/iperf3 tcp
> transfers, with irq and user-space processes pinned.
>
> The average tput for 30s connections does not change measurably: ~23Gbps
> per connection. WRT the receiver buffer, in 30 runs prior to this patch
> I see:
>
> min 1901769, max 4322922 avg 2900036
>
> On top of this series:
>
> min 1078047 max 3967327 avg 2465665.
>
> So I do see smaller buffers on average, but I'm not sure I'm hitting the
>  reference scenario (notably the lowest value here is considerably
> higher than the theoretical minimum rcvwin required to handle the given
> B/W).
>
> Should I go for longer (or shorter) connections?

23 Gbps seems small ?

I would perhaps use 8 senders, and force all receivers on one cpu (cpu
4 in the following run)

for i in {1..8}
do
 netperf -H host -T,4 -l 100 &
done

This would I think show what can happen when receivers can not keep up.