[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250217232905.3162187-1-kuba@kernel.org>
Date: Mon, 17 Feb 2025 15:29:05 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: edumazet@...gle.com
Cc: netdev@...r.kernel.org,
davem@...emloft.net,
pabeni@...hat.com,
andrew+netdev@...n.ch,
horms@...nel.org,
Jakub Kicinski <kuba@...nel.org>,
ncardwell@...gle.com,
kuniyu@...zon.com,
hli@...flix.com,
quic_stranche@...cinc.com,
quic_subashab@...cinc.com
Subject: [PATCH net] tcp: adjust rcvq_space after updating scaling ratio
Since commit under Fixes we set the window clamp in accordance
to newly measured rcvbuf scaling_ratio. If the scaling_ratio
decreased significantly we may put ourselves in a situation
where windows become smaller than rcvq_space, preventing
tcp_rcv_space_adjust() from increasing rcvbuf.
The significant decrease of scaling_ratio is far more likely
since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"),
which increased the "default" scaling ratio from ~30% to 50%.
Hitting the bad condition depends a lot on TCP tuning, and
drivers at play. One of Meta's workloads hits it reliably
under following conditions:
- default rcvbuf of 125k
- sender MTU 1500, receiver MTU 5000
- driver settles on scaling_ratio of 78 for the config above.
Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss
(10 * 5k = 50k). Once we find out the true scaling ratio and
MSS we clamp the windows to 38k. Triggering the condition also
depends on the message sequence of this workload. I can't repro
the problem with simple iperf or TCP_RR-style tests.
Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
Signed-off-by: Jakub Kicinski <kuba@...nel.org>
---
CC: ncardwell@...gle.com
CC: kuniyu@...zon.com
CC: hli@...flix.com
CC: quic_stranche@...cinc.com
CC: quic_subashab@...cinc.com
---
net/ipv4/tcp_input.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 074406890552..6467716820ff 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -243,9 +243,15 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
do_div(val, skb->truesize);
tcp_sk(sk)->scaling_ratio = val ? val : 1;
- if (old_ratio != tcp_sk(sk)->scaling_ratio)
- WRITE_ONCE(tcp_sk(sk)->window_clamp,
- tcp_win_from_space(sk, sk->sk_rcvbuf));
+ if (old_ratio != tcp_sk(sk)->scaling_ratio) {
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ val = tcp_win_from_space(sk, sk->sk_rcvbuf);
+ tcp_set_window_clamp(sk, val);
+
+ if (tp->window_clamp < tp->rcvq_space.space)
+ tp->rcvq_space.space = tp->window_clamp;
+ }
}
icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
tcp_sk(sk)->advmss);
--
2.48.1
Powered by blists - more mailing lists