[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQynKT7QEhm1WksrNQv3BbYhTd=wWaxueybPBQDPtXbJu-A@mail.gmail.com>
Date: Mon, 29 Jul 2024 10:51:25 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>
Cc: edumazet@...gle.com, soheil@...gle.com, yyd@...gle.com, ycheng@...gle.com,
davem@...emloft.net, kuba@...nel.org, netdev@...r.kernel.org,
dsahern@...nel.org, pabeni@...hat.com,
Sean Tranchetti <quic_stranche@...cinc.com>
Subject: Re: [PATCH net v2] tcp: Adjust clamping window for applications
specifying SO_RCVBUF
On Fri, Jul 26, 2024 at 4:41 PM Subash Abhinov Kasiviswanathan
<quic_subashab@...cinc.com> wrote:
>
> tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> rcvbuf after
> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> and 50% of rcvbuf after
> commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> 50% tries to emulate the behavior of older kernels using
> sysctl_tcp_adv_win_scale with default value.
>
> Systems which were using a different values of sysctl_tcp_adv_win_scale
> in older kernels ended up seeing reduced download speeds in certain
> cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
> While the sysctl scheme is no longer acceptable, the value of 50% is
> a bit conservative when the skb->len/skb->truesize ratio is later
> determined to be ~0.66.
>
> Applications not specifying SO_RCVBUF update the window scaling and
> the receiver buffer every time data is copied to userspace. This
> computation is now used for applications setting SO_RCVBUF to update
> the maximum window scaling while ensuring that the receive buffer
> is within the application specified limit.
>
> Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> Signed-off-by: Sean Tranchetti <quic_stranche@...cinc.com>
> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>
> ---
> v1 -> v2
> Update the condition for SO_RCVBUF window_clamp updates to always
> monitor the current rcvbuf value as suggested by Eric.
>
> net/ipv4/tcp_input.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 454362e359da..e2b9583ed96a 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
> * <prev RTT . ><current RTT .. ><next RTT .... >
> */
>
> - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
> - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
> u64 rcvwin, grow;
> int rcvbuf;
>
> @@ -771,12 +770,22 @@ void tcp_rcv_space_adjust(struct sock *sk)
>
> rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
> READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
> - if (rcvbuf > sk->sk_rcvbuf) {
> - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> + if (rcvbuf > sk->sk_rcvbuf) {
> + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
>
> - /* Make the window clamp follow along. */
> - WRITE_ONCE(tp->window_clamp,
> - tcp_win_from_space(sk, rcvbuf));
> + /* Make the window clamp follow along. */
> + WRITE_ONCE(tp->window_clamp,
> + tcp_win_from_space(sk, rcvbuf));
> + }
> + } else {
> + /* Make the window clamp follow along while being bounded
> + * by SO_RCVBUF.
> + */
> + int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
> +
> + if (clamp > tp->window_clamp)
> + WRITE_ONCE(tp->window_clamp, clamp);
> }
> }
> tp->rcvq_space.space = copied;
> --
Is this the correct place to put this new code to update
tp->window_clamp? AFAICT it's not the correct place.
If a system administrator has disabled receive buffer autotuning by
setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or if (copied <=
tp->rcvq_space.space), then TCP connections will not reach this new
code, and the window_clamp will not be adjusted, and the receive
window will still be too low.
Even if a system administrator has disabled receive buffer autotuning
by setting `sysctl net.ipv4.tcp_moderate_rcvbuf=0`, or even if (copied
<= tp->rcvq_space.space), AFAICT we still want the correct receive
window value for whatever sk->sk_rcvbuf we have, based on the correct
tp->scaling_ratio.
So AFAICT the correct place to put this kind of logic is in
tcp_measure_rcv_mss(). If we compute a new scaling_ratio and it's
different than tp->scaling_ratio, then it seems we should compute a
new window_clamp value using sk->sk_rcvbuf, and if the new
window_clamp value is different then we should WRITE_ONCE that value
into tp->window_clamp.
That way we can have the correct tp->window_clamp, no matter the value
of net.ipv4.tcp_moderate_rcvbuf, and even if (copied <=
tp->rcvq_space.space).
How does that sound?
neal
Powered by blists - more mailing lists