lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLeRbVWxBOXRRmLNqiF8Lz3-iykCzkVE=_JR7GEycyhSw@mail.gmail.com>
Date: Tue, 18 Nov 2025 08:10:10 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Neal Cardwell <ncardwell@...gle.com>, Rick Jones <jonesrick@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/2] tcp: add net.ipv4.tcp_rtt_threshold sysctl

On Tue, Nov 18, 2025 at 6:04 AM Neal Cardwell <ncardwell@...gle.com> wrote:
>
> On Mon, Nov 17, 2025 at 8:28 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > This is a follow up of commit aa251c84636c ("tcp: fix too slow
> > tcp_rcvbuf_grow() action") which brought again the issue that I tried
> > to fix in commit 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot")
> >
> > We also recently increased tcp_rmem[2] to 32 MB in commit 572be9bf9d0d
> > ("tcp: increase tcp_rmem[2] to 32 MB")
> >
> > Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf
> > too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can
> > force NIC driver to not recycle pages from the page pool, and also
> > can cause cache evictions for DDIO enabled cpus/NIC, as receivers
> > are usually slower than senders.
> >
> > Add net.ipv4.tcp_rtt_threshold sysctl, set by default to 1000 usec (1 ms)
> > If RTT if smaller than the sysctl value, use the RTT/tcp_rtt_threshold
> > ratio to control sk_rcvbuf inflation.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > ---
> >  Documentation/networking/ip-sysctl.rst         | 10 ++++++++++
> >  .../net_cachelines/netns_ipv4_sysctl.rst       |  1 +
> >  include/net/netns/ipv4.h                       |  1 +
> >  net/core/net_namespace.c                       |  2 ++
> >  net/ipv4/sysctl_net_ipv4.c                     |  9 +++++++++
> >  net/ipv4/tcp_input.c                           | 18 ++++++++++++++----
> >  net/ipv4/tcp_ipv4.c                            |  1 +
> >  7 files changed, 38 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> > index 2bae61be18593a8111a83d9f034517e4646eb653..ce2a223e17a61b40fc35b2528c8ee4cf8f750993 100644
> > --- a/Documentation/networking/ip-sysctl.rst
> > +++ b/Documentation/networking/ip-sysctl.rst
> > @@ -673,6 +673,16 @@ tcp_moderate_rcvbuf - BOOLEAN
> >
> >         Default: 1 (enabled)
> >
> > +tcp_rtt_threshold - INTEGER
> > +       rcvbuf autotuning can over estimate final socket rcvbuf, which
> > +       can lead to cache trashing for high throughput flows.
> > +
> > +       For small RTT flows (below tcp_rtt_threshold usecs), we can relax
> > +       rcvbuf growth: Few additional ms to reach the final (and smaller)
> > +       rcvbuf is a good tradeoff.
> > +
> > +       Default : 1000 (1 ms)
>
> Thanks, Eric! The logic of this code looks good to me.
>
> For the name of the sysctl, perhaps we can pick something more
> specific than "tcp_rtt_threshold", to clarify to the reader what the
> RTT threshold is used for?

I forgot Rick Jones suggested   tcp_autotune_rtt_threshold .

>
> And if the name is more specific about what the threshold is for, then
> in the future if we want RTT thresholds for other behavior (e.g.,
> tuning the tcp_tso_rtt_log code or other future RTT-based
> mechanisms?), then it will be easier to add those future RTT
> thresholds without the names seeming confusing and error-prone?
>
> With the existing "tcp_moderate_rcvbuf" sysctl in mind, how about a
> name like "tcp_rcvbuf_low_rtt"?

I am not good at names, maybe we should reconcile RIck suggestion with yours ?


>
> Then the description in ip-sysctl.rst could read as something like:
> "We can relax rcvbuf growth for low RTT flows (with RTT below
> tcp_rcvbuf_low_rtt usecs):".
>
> WDYT?
> neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ