lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iJsqvNFgAoUfPcB==PiVeMHNBFB1uyWkF4Egs0KBW-ENg@mail.gmail.com>
Date: Thu, 6 Nov 2025 08:09:48 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Neal Cardwell <ncardwell@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: add net.ipv4.tcp_comp_sack_rtt_percent

On Thu, Nov 6, 2025 at 6:57 AM Jesper Dangaard Brouer <hawk@...nel.org> wrote:
>
> On 05/11/2025 10.38, Eric Dumazet wrote:
> > TCP SACK compression has been added in 2018 in commit
> > 5d9f4262b7ea ("tcp: add SACK compression").
> >
> > It is working great for WAN flows (with large RTT).
> > Wifi in particular gets a significant boost _when_ ACK are suppressed.
> >
> > Add a new sysctl so that we can tune the very conservative 5 % value
> > that has been used so far in this formula, so that small RTT flows
> > can benefit from this feature.
> >
> > delay = min ( 5 % of RTT, 1 ms)
> >
> > This patch adds new tcp_comp_sack_rtt_percent sysctl
> > to ease experiments and tuning.
> >
> > Given that we cap the delay to 1ms (tcp_comp_sack_delay_ns sysctl),
> > set the default value to 100.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > ---
> >   Documentation/networking/ip-sysctl.rst | 13 +++++++++++--
> >   include/net/netns/ipv4.h               |  1 +
> >   net/ipv4/sysctl_net_ipv4.c             |  9 +++++++++
> >   net/ipv4/tcp_input.c                   | 26 ++++++++++++++++++--------
> >   net/ipv4/tcp_ipv4.c                    |  1 +
> >   5 files changed, 40 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> > index 7cd35bfd39e68c5b2650eb9d0fbb76e34aed3f2b..ebc11f593305bf87e7d4ad4d50ef085b22aef7da 100644
> > --- a/Documentation/networking/ip-sysctl.rst
> > +++ b/Documentation/networking/ip-sysctl.rst
> > @@ -854,9 +854,18 @@ tcp_sack - BOOLEAN
> >
> >       Default: 1 (enabled)
> >
> > +tcp_comp_sack_rtt_percent - INTEGER
> > +     Percentage of SRTT used for the compressed SACK feature.
> > +     See tcp_comp_sack_nr, tcp_comp_sack_delay_ns, tcp_comp_sack_slack_ns.
> > +
> > +     Possible values : 1 - 1000
>
> If this is a percentage, why does it allow 1000 as max?

To allow a 10x max. If someone needs more in the future, we might change it.

> > +     /* delay = (rtt >> 3) * NSEC_PER_USEC * comp_sack_rtt_percent / 100
> > +      * ->
> > +      * delay = rtt * 1.25 * comp_sack_rtt_percent
> > +      */
>
> Why explain this with shifts.  I have to use extra time to remember that
> shift ">> 3" is the same as div "/" 8.  And that ">>" 2 is the same as
> div "/4".  For the code, I think the compiler will convert /4 to >>2
> anyway.  I don't feel strongly about this, so I'll let it be up to you
> if you want to adjust this or not.

Look elsewhere in TCP, we always convert rtt with (rtt >> 3), because
it is kept scaled by 3.

I personally prefer this to rtt/8, but I understand if you are from
another team :)

>
>
> > +     delay = (u64)(rtt + (rtt >> 2)) *
> > +             READ_ONCE(net->ipv4.sysctl_tcp_comp_sack_rtt_percent);
> > +
> > +     delay = min(delay, READ_ONCE(net->ipv4.sysctl_tcp_comp_sack_delay_ns));
> > +
> >       sock_hold(sk);
> >       hrtimer_start_range_ns(&tp->compressed_ack_timer, ns_to_ktime(delay),
> > -                            READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_slack_ns),
> > +                            READ_ONCE(net->ipv4.sysctl_tcp_comp_sack_slack_ns),
> >                              HRTIMER_MODE_REL_PINNED_SOFT);
> >   }
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index b7526a7888cbe296c0f4ba6350772741cfe1765b..a4411cd0229cb7fc5903d206e549d0889d177937 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -3596,6 +3596,7 @@ static int __net_init tcp_sk_init(struct net *net)
> >       net->ipv4.sysctl_tcp_comp_sack_delay_ns = NSEC_PER_MSEC;
> >       net->ipv4.sysctl_tcp_comp_sack_slack_ns = 100 * NSEC_PER_USEC;
> >       net->ipv4.sysctl_tcp_comp_sack_nr = 44;
> > +     net->ipv4.sysctl_tcp_comp_sack_rtt_percent = 100;
> >       net->ipv4.sysctl_tcp_backlog_ack_defer = 1;
> >       net->ipv4.sysctl_tcp_fastopen = TFO_CLIENT_ENABLE;
> >       net->ipv4.sysctl_tcp_fastopen_blackhole_timeout = 0;
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ