[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240726204105.1466841-1-quic_subashab@quicinc.com>
Date: Fri, 26 Jul 2024 13:41:05 -0700
From: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>
To: edumazet@...gle.com, soheil@...gle.com, ncardwell@...gle.com,
yyd@...gle.com, ycheng@...gle.com, davem@...emloft.net,
kuba@...nel.org, netdev@...r.kernel.org, dsahern@...nel.org,
pabeni@...hat.com
Cc: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>,
Sean Tranchetti <quic_stranche@...cinc.com>
Subject: [PATCH net v2] tcp: Adjust clamping window for applications specifying SO_RCVBUF
tp->scaling_ratio is not updated based on skb->len/skb->truesize once
SO_RCVBUF is set leading to the maximum window scaling to be 25% of
rcvbuf after
commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
and 50% of rcvbuf after
commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
50% tries to emulate the behavior of older kernels using
sysctl_tcp_adv_win_scale with default value.
Systems which were using a different values of sysctl_tcp_adv_win_scale
in older kernels ended up seeing reduced download speeds in certain
cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
While the sysctl scheme is no longer acceptable, the value of 50% is
a bit conservative when the skb->len/skb->truesize ratio is later
determined to be ~0.66.
Applications not specifying SO_RCVBUF update the window scaling and
the receiver buffer every time data is copied to userspace. This
computation is now used for applications setting SO_RCVBUF to update
the maximum window scaling while ensuring that the receive buffer
is within the application specified limit.
Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Sean Tranchetti <quic_stranche@...cinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>
---
v1 -> v2
Update the condition for SO_RCVBUF window_clamp updates to always
monitor the current rcvbuf value as suggested by Eric.
net/ipv4/tcp_input.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 454362e359da..e2b9583ed96a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
* <prev RTT . ><current RTT .. ><next RTT .... >
*/
- if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
- !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
u64 rcvwin, grow;
int rcvbuf;
@@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk)
rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
- if (rcvbuf > sk->sk_rcvbuf) {
- WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
+ if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+ if (rcvbuf > sk->sk_rcvbuf) {
+ WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
- /* Make the window clamp follow along. */
- WRITE_ONCE(tp->window_clamp,
- tcp_win_from_space(sk, rcvbuf));
+ /* Make the window clamp follow along. */
+ WRITE_ONCE(tp->window_clamp,
+ tcp_win_from_space(sk, rcvbuf));
+ }
+ } else {
+ /* Make the window clamp follow along while being bounded
+ * by SO_RCVBUF.
+ */
+ int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
+
+ if (clamp > tp->window_clamp)
+ WRITE_ONCE(tp->window_clamp, clamp);
}
}
tp->rcvq_space.space = copied;
--
2.34.1
Powered by blists - more mailing lists