netdev - Re: [PATCH net-next 2/2] tcp: add net.ipv4.tcp_rtt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJfnWSn-1hghtJEaZ5u8_+9B7eCTZ07U9GnGh6UxS8rJw@mail.gmail.com>
Date: Wed, 19 Nov 2025 00:01:33 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/2] tcp: add net.ipv4.tcp_rtt_threshold sysctl

On Tue, Nov 18, 2025 at 1:22 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Nov 18, 2025 at 1:15 PM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > Hi,
> >
> > On 11/17/25 2:28 PM, Eric Dumazet wrote:
> > > This is a follow up of commit aa251c84636c ("tcp: fix too slow
> > > tcp_rcvbuf_grow() action") which brought again the issue that I tried
> > > to fix in commit 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot")
> > >
> > > We also recently increased tcp_rmem[2] to 32 MB in commit 572be9bf9d0d
> > > ("tcp: increase tcp_rmem[2] to 32 MB")
> > >
> > > Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf
> > > too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can
> > > force NIC driver to not recycle pages from the page pool, and also
> > > can cause cache evictions for DDIO enabled cpus/NIC, as receivers
> > > are usually slower than senders.
> > >
> > > Add net.ipv4.tcp_rtt_threshold sysctl, set by default to 1000 usec (1 ms)
> > > If RTT if smaller than the sysctl value, use the RTT/tcp_rtt_threshold
> > > ratio to control sk_rcvbuf inflation.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> >
> > I gave this series a spin in my test-bed: 2 quite old hosts b2b
> > connected via 100Gbps links. RTT is < 100us. Doing bulk/iperf3 tcp
> > transfers, with irq and user-space processes pinned.
> >
> > The average tput for 30s connections does not change measurably: ~23Gbps
> > per connection. WRT the receiver buffer, in 30 runs prior to this patch
> > I see:
> >
> > min 1901769, max 4322922 avg 2900036
> >
> > On top of this series:
> >
> > min 1078047 max 3967327 avg 2465665.
> >
> > So I do see smaller buffers on average, but I'm not sure I'm hitting the
> >  reference scenario (notably the lowest value here is considerably
> > higher than the theoretical minimum rcvwin required to handle the given
> > B/W).
> >
> > Should I go for longer (or shorter) connections?
>
> 23 Gbps seems small ?
>
> I would perhaps use 8 senders, and force all receivers on one cpu (cpu
> 4 in the following run)
>
> for i in {1..8}
> do
>  netperf -H host -T,4 -l 100 &
> done
>
> This would I think show what can happen when receivers can not keep up.

I will add a Tested: section with some numbers in V2.

And switch to tcp_rcvbuf_low_rtt name as Neal suggested.