lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 20 May 2024 17:12:30 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: "Subash Abhinov Kasiviswanathan (KS)" <quic_subashab@...cinc.com>
Cc: soheil@...gle.com, ncardwell@...gle.com, yyd@...gle.com, ycheng@...gle.com, 
	quic_stranche@...cinc.com, davem@...emloft.net, kuba@...nel.org, 
	netdev@...r.kernel.org
Subject: Re: Potential impact of commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")

On Sun, May 19, 2024 at 4:14 AM Subash Abhinov Kasiviswanathan (KS)
<quic_subashab@...cinc.com> wrote:
>
>
>
> On 5/17/2024 1:08 AM, Subash Abhinov Kasiviswanathan (KS) wrote:
> >
> >
> > On 5/16/2024 12:49 PM, Subash Abhinov Kasiviswanathan (KS) wrote:
> >> On 5/16/2024 2:31 AM, Eric Dumazet wrote:
> >>> On Thu, May 16, 2024 at 9:57 AM Eric Dumazet <edumazet@...gle.com>
> >>> wrote:
> >>>>
> >>>> On Thu, May 16, 2024 at 9:16 AM Subash Abhinov Kasiviswanathan (KS)
> >>>> <quic_subashab@...cinc.com> wrote:
> >>>>>
> >>>>> On 5/15/2024 11:36 PM, Eric Dumazet wrote:
> >>>>>> On Thu, May 16, 2024 at 4:32 AM Subash Abhinov Kasiviswanathan (KS)
> >>>>>> <quic_subashab@...cinc.com> wrote:
> >>>>>>>
> >>>>>>> On 5/15/2024 1:10 AM, Eric Dumazet wrote:
> >>>>>>>> On Wed, May 15, 2024 at 6:47 AM Subash Abhinov Kasiviswanathan (KS)
> >>>>>>>> <quic_subashab@...cinc.com> wrote:
> >>>>>>>>>
> >>>>>>>>> We recently noticed that a device running a 6.6.17 kernel (A)
> >>>>>>>>> was having
> >>>>>>>>> a slower single stream download speed compared to a device running
> >>>>>>>>> 6.1.57 kernel (B). The test here is over mobile radio with
> >>>>>>>>> iperf3 with
> >>>>>>>>> window size 4M from a third party server.
> >>>>>>>>
> >>>>>>
> >>> This is not fixable easily, because tp->window_clamp has been
> >>> historically abused.
> >>>
> >>> TCP_WINDOW_CLAMP socket option should have used a separate tcp socket
> >>> field
> >>> to remember tp->window_clamp has been set (fixed) to a user value.
> >>>
> >>> Make sure you have this followup patch, dealing with applications
> >>> still needing to make TCP slow.
> >>>
> >>> commit 697a6c8cec03c2299f850fa50322641a8bf6b915
> >>> Author: Hechao Li <hli@...flix.com>
> >>> Date:   Tue Apr 9 09:43:55 2024 -0700
> >>>
> >>>      tcp: increase the default TCP scaling ratio
> >>>> What happens if you let autotuning enabled ?
> >> I'll try this test and also the test with 4M SO_RCVBUF on the device
> >> configuration where the download issue was observed and report back
> >> with the findings.
> > With autotuning, the receiver window scaled to ~9M. The download speed
> > matched whatever I got with setting SO_RCVBUF 16M on A earlier (which
> > aligns with previous observation as the window scaled to ~8M without the
> > commit).
> >
> > With 4M SO_RCVBUF, the receiver window scaled to ~4M. Download speed
> > increased significantly but didn't match the download speed of B with 4M
> > SO_RCVBUF. Per commit description, the commit matches the behavior as if
> > tcp_adv_win_scale was set to 1.
> >
> > Download speed of B is higher than A for 4M SO_RCVBUF as receiver window
> > of B grew to ~6M. This is because B had tcp_adv_win_scale set to 2.
> Would the following to change to re-enable the use of sysctl
> tcp_adv_win_scale to set the initial scaling ratio be acceptable.
> Default value of tcp_adv_win_scale is 1 which corresponds to the
> existing 50% ratio.
>
> I verified with this patch on A that setting SO_RCVBUF 4M in iperf3 with
> tcp_adv_win_scale = 1 (default) scales receiver window to ~4M while
> tcp_adv_win_scale = 2 scales receiver window to ~6M (which matches the
> behavior from B).

What problem are you trying to solve that commit  697a6c8cec03c229
did not ?

>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 618f991cb336..1bca7d2e47c8 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1460,14 +1460,23 @@ static inline int tcp_space_from_win(const
> struct sock *sk, int win)
>          return __tcp_space_from_win(tcp_sk(sk)->scaling_ratio, win);
>   }
>
> -/* Assume a 50% default for skb->len/skb->truesize ratio.
> - * This may be adjusted later in tcp_measure_rcv_mss().
> - */
> -#define TCP_DEFAULT_SCALING_RATIO (1 << (TCP_RMEM_TO_WIN_SCALE - 1))
> -
>   static inline void tcp_scaling_ratio_init(struct sock *sk)
>   {
> -       tcp_sk(sk)->scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
> +       int win_scale =
> READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale);
> +
> +       if (win_scale <= 0) {
> +               if (win_scale < -TCP_RMEM_TO_WIN_SCALE)
> +                       win_scale = -TCP_RMEM_TO_WIN_SCALE;
> +
> +               tcp_sk(sk)->scaling_ratio =
> +                       1 << (TCP_RMEM_TO_WIN_SCALE + win_scale);
> +       } else {
> +               if (win_scale > TCP_RMEM_TO_WIN_SCALE)
> +                       win_scale = TCP_RMEM_TO_WIN_SCALE;
> +
> +               tcp_sk(sk)->scaling_ratio = U8_MAX -
> +                       (1 << (TCP_RMEM_TO_WIN_SCALE - win_scale));
> +       }
>   }
>
>   /* Note: caller must be prepared to deal with negative returns */

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ