lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJ5eGCGgF+_4VxXXV_oMv8Bi-Ugq+MG6=bs+74FR63GUQ@mail.gmail.com>
Date: Fri, 26 Jul 2024 11:40:39 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>
Cc: soheil@...gle.com, ncardwell@...gle.com, yyd@...gle.com, ycheng@...gle.com, 
	davem@...emloft.net, kuba@...nel.org, netdev@...r.kernel.org, 
	Sean Tranchetti <quic_stranche@...cinc.com>
Subject: Re: [PATCH net] tcp: Adjust clamping window for applications
 specifying SO_RCVBUF

On Thu, Jul 25, 2024 at 11:55 PM Subash Abhinov Kasiviswanathan
<quic_subashab@...cinc.com> wrote:
>
> tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> rcvbuf after
> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> and 50% of rcvbuf after
> commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> 50% tries to emulate the behavior of older kernels using
> sysctl_tcp_adv_win_scale with default value.
>
> Systems which were using a different values of sysctl_tcp_adv_win_scale
> in older kernels ended up seeing reduced download speeds in certain
> cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
> While the sysctl scheme is no longer acceptable, the value of 50% is
> a bit conservative when the skb->len/skb->truesize ratio is later
> determined to be ~0.66.
>
> Applications not specifying SO_RCVBUF update the window scaling and
> the receiver buffer every time data is copied to userspace. This
> computation is now used for applications setting SO_RCVBUF to update
> the maximum window scaling while ensuring that the receive buffer
> is within the application specified limit.
>
> Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> Signed-off-by: Sean Tranchetti <quic_stranche@...cinc.com>
> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>
> ---
>  net/ipv4/tcp_input.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 454362e359da..c8fb029a15a4 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
>          * <prev RTT . ><current RTT .. ><next RTT .... >
>          */
>
> -       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
> -           !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> +       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
>                 u64 rcvwin, grow;
>                 int rcvbuf;
>
> @@ -771,12 +770,24 @@ void tcp_rcv_space_adjust(struct sock *sk)
>
>                 rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
>                                READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
> -               if (rcvbuf > sk->sk_rcvbuf) {
> -                       WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> +               if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> +                       if (rcvbuf > sk->sk_rcvbuf) {
> +                               WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
>
> -                       /* Make the window clamp follow along.  */
> -                       WRITE_ONCE(tp->window_clamp,
> -                                  tcp_win_from_space(sk, rcvbuf));
> +                               /* Make the window clamp follow along.  */
> +                               WRITE_ONCE(tp->window_clamp,
> +                                          tcp_win_from_space(sk, rcvbuf));
> +                       }
> +               } else {
> +                       /* Make the window clamp follow along while being bounded
> +                        * by SO_RCVBUF.
> +                        */
> +                       if (rcvbuf <= sk->sk_rcvbuf) {

I do not really understand this part.
I am guessing this test will often be false and your problem won't be fixed.
You do not handle all  sysctl_tcp_adv_win_scale values (positive and negative)

I would instead not use "if (rcvbuf <= sk->sk_rcvbuf) {"

and instead :

else {
      int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));

      if (clamp > tp->window_clamp)
            WRITE_ONCE(tp->window_clamp, clamp);
}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ