lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQyk4=w-Qbw24na3_09SRfbP3w+W9trzhd2vOLfeVui8-BA@mail.gmail.com>
Date: Tue, 18 Nov 2025 09:04:14 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/2] tcp: add net.ipv4.tcp_rtt_threshold sysctl

On Mon, Nov 17, 2025 at 8:28 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> This is a follow up of commit aa251c84636c ("tcp: fix too slow
> tcp_rcvbuf_grow() action") which brought again the issue that I tried
> to fix in commit 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot")
>
> We also recently increased tcp_rmem[2] to 32 MB in commit 572be9bf9d0d
> ("tcp: increase tcp_rmem[2] to 32 MB")
>
> Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf
> too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can
> force NIC driver to not recycle pages from the page pool, and also
> can cause cache evictions for DDIO enabled cpus/NIC, as receivers
> are usually slower than senders.
>
> Add net.ipv4.tcp_rtt_threshold sysctl, set by default to 1000 usec (1 ms)
> If RTT if smaller than the sysctl value, use the RTT/tcp_rtt_threshold
> ratio to control sk_rcvbuf inflation.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>  Documentation/networking/ip-sysctl.rst         | 10 ++++++++++
>  .../net_cachelines/netns_ipv4_sysctl.rst       |  1 +
>  include/net/netns/ipv4.h                       |  1 +
>  net/core/net_namespace.c                       |  2 ++
>  net/ipv4/sysctl_net_ipv4.c                     |  9 +++++++++
>  net/ipv4/tcp_input.c                           | 18 ++++++++++++++----
>  net/ipv4/tcp_ipv4.c                            |  1 +
>  7 files changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index 2bae61be18593a8111a83d9f034517e4646eb653..ce2a223e17a61b40fc35b2528c8ee4cf8f750993 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -673,6 +673,16 @@ tcp_moderate_rcvbuf - BOOLEAN
>
>         Default: 1 (enabled)
>
> +tcp_rtt_threshold - INTEGER
> +       rcvbuf autotuning can over estimate final socket rcvbuf, which
> +       can lead to cache trashing for high throughput flows.
> +
> +       For small RTT flows (below tcp_rtt_threshold usecs), we can relax
> +       rcvbuf growth: Few additional ms to reach the final (and smaller)
> +       rcvbuf is a good tradeoff.
> +
> +       Default : 1000 (1 ms)

Thanks, Eric! The logic of this code looks good to me.

For the name of the sysctl, perhaps we can pick something more
specific than "tcp_rtt_threshold", to clarify to the reader what the
RTT threshold is used for?

And if the name is more specific about what the threshold is for, then
in the future if we want RTT thresholds for other behavior (e.g.,
tuning the tcp_tso_rtt_log code or other future RTT-based
mechanisms?), then it will be easier to add those future RTT
thresholds without the names seeming confusing and error-prone?

With the existing "tcp_moderate_rcvbuf" sysctl in mind, how about a
name like "tcp_rcvbuf_low_rtt"?

Then the description in ip-sysctl.rst could read as something like:
"We can relax rcvbuf growth for low RTT flows (with RTT below
tcp_rcvbuf_low_rtt usecs):".

WDYT?
neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ