[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQymZ1tFnEA1Q=vtECs0=Db7zHQ8=+WCQtnhHFVbEOzjVnQ@mail.gmail.com>
Date: Wed, 5 Nov 2025 09:41:35 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: add net.ipv4.tcp_comp_sack_rtt_percent
On Wed, Nov 5, 2025 at 4:38 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> TCP SACK compression has been added in 2018 in commit
> 5d9f4262b7ea ("tcp: add SACK compression").
>
> It is working great for WAN flows (with large RTT).
> Wifi in particular gets a significant boost _when_ ACK are suppressed.
>
> Add a new sysctl so that we can tune the very conservative 5 % value
> that has been used so far in this formula, so that small RTT flows
> can benefit from this feature.
>
> delay = min ( 5 % of RTT, 1 ms)
>
> This patch adds new tcp_comp_sack_rtt_percent sysctl
> to ease experiments and tuning.
>
> Given that we cap the delay to 1ms (tcp_comp_sack_delay_ns sysctl),
> set the default value to 100.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
> Documentation/networking/ip-sysctl.rst | 13 +++++++++++--
> include/net/netns/ipv4.h | 1 +
> net/ipv4/sysctl_net_ipv4.c | 9 +++++++++
> net/ipv4/tcp_input.c | 26 ++++++++++++++++++--------
> net/ipv4/tcp_ipv4.c | 1 +
> 5 files changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index 7cd35bfd39e68c5b2650eb9d0fbb76e34aed3f2b..ebc11f593305bf87e7d4ad4d50ef085b22aef7da 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -854,9 +854,18 @@ tcp_sack - BOOLEAN
>
> Default: 1 (enabled)
>
> +tcp_comp_sack_rtt_percent - INTEGER
> + Percentage of SRTT used for the compressed SACK feature.
> + See tcp_comp_sack_nr, tcp_comp_sack_delay_ns, tcp_comp_sack_slack_ns.
> +
> + Possible values : 1 - 1000
> +
> + Default : 100 %
Overall the patch looks great to me, but for the default value I would
suggest 33% rather than 100%.
AFAICT, basically, in data center environments with RTT < 1ms, if
tcp_comp_sack_rtt_percent is 100% and we allow the data receiver to
wait a whole SRTT before it sends an ACK, then the data sender is
likely to have fully used up its cwnd at the point that the receiver
finally sends the ACK at the end of the SRTT. That means that for the
entire time that the ACK is traveling from the data receiver to the
data sender, the data sender has no permission (from congestion
control) to send. So the "pipe" (data sender -> data receiver network
path) will have an idle bubble for a long time while the data sender
is waiting for the ACK. In these cases, the system would lose
pipelining and would end up in a "stop and wait" mode.
The rationale for 33% is basically to try to facilitate pipelining,
where there are always at least 3 ACKs and 3 GSO/TSO skbs per SRTT, so
that the path can maintain a budget for 3 full-sized GSO/TSO skbs "in
flight" at all times:
+ 1 skb in the qdisc waiting to be sent by the NIC next
+ 1 skb being sent by the NIC (being serialized by the NIC out onto the wire)
+ 1 skb being received and aggregated by the receiver machine's
aggregation mechanism (some combination of LRO, GRO, and sack
compression)
Note that this is basically the same magic number (3) and the same
rationales as:
(a) tcp_tso_should_defer() ensuring that we defer sending data for no
longer than cwnd/tcp_tso_win_divisor (where tcp_tso_win_divisor = 3),
and
(b) bbr_quantization_budget() ensuring that cwnd is at least 3 GSO/TSO
skbs to maintain pipelining and full throughput at low RTTs
It is also similar in spirit to your 2014 patch that limits GSO skbs
to half the cwnd, again to help maintain pipelining:
tcp: limit GSO packets to half cwnd
d649a7a81f3b5bacb1d60abd7529894d8234a666
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d649a7a81f3b5bacb1d60abd7529894d8234a666
WDYT?
neal
Powered by blists - more mailing lists