lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJQRM=j4gXo4NEZkHO=eQaqewS5S0kAs9JLpuOD_4UWyg@mail.gmail.com>
Date: Thu, 16 May 2024 10:31:21 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: "Subash Abhinov Kasiviswanathan (KS)" <quic_subashab@...cinc.com>
Cc: soheil@...gle.com, ncardwell@...gle.com, yyd@...gle.com, ycheng@...gle.com, 
	quic_stranche@...cinc.com, davem@...emloft.net, kuba@...nel.org, 
	netdev@...r.kernel.org
Subject: Re: Potential impact of commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")

On Thu, May 16, 2024 at 9:57 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Thu, May 16, 2024 at 9:16 AM Subash Abhinov Kasiviswanathan (KS)
> <quic_subashab@...cinc.com> wrote:
> >
> > On 5/15/2024 11:36 PM, Eric Dumazet wrote:
> > > On Thu, May 16, 2024 at 4:32 AM Subash Abhinov Kasiviswanathan (KS)
> > > <quic_subashab@...cinc.com> wrote:
> > >>
> > >> On 5/15/2024 1:10 AM, Eric Dumazet wrote:
> > >>> On Wed, May 15, 2024 at 6:47 AM Subash Abhinov Kasiviswanathan (KS)
> > >>> <quic_subashab@...cinc.com> wrote:
> > >>>>
> > >>>> We recently noticed that a device running a 6.6.17 kernel (A) was having
> > >>>> a slower single stream download speed compared to a device running
> > >>>> 6.1.57 kernel (B). The test here is over mobile radio with iperf3 with
> > >>>> window size 4M from a third party server.
> > >>>
> > >
> > > DRS is historically sensitive to initial conditions.
> > >
> > > tcp_rmem[1] seems too big here for DRS to kick smoothly.
> > >
> > > I would use 0.5 MB perhaps, this will also also use less memory for
> > > local (small rtt) connections
> > I tried 0.5MB for the rmem[1] and I see the same behavior where the
> > receiver window is not scaling beyond half of what is specified on
> > iperf3 and is not matching the download speed of B.
>
>
> What do you mean by "specified by iperf3" ?
>
> We can not guarantee any stable performance for applications setting SO_RCVBUF.
>
> This is because the memory overhead depends from one version to the other.

Issue here is that SO_RCVBUF is set before TCP has a chance to receive
any packets.

Sensing the skb->len/skb->truesize is not possible.

Therefore the default value is conservative, and might not be good for
your case.

This is not fixable easily, because tp->window_clamp has been
historically abused.

TCP_WINDOW_CLAMP socket option should have used a separate tcp socket field
to remember tp->window_clamp has been set (fixed) to a user value.

Make sure you have this followup patch, dealing with applications
still needing to make TCP slow.

commit 697a6c8cec03c2299f850fa50322641a8bf6b915
Author: Hechao Li <hli@...flix.com>
Date:   Tue Apr 9 09:43:55 2024 -0700

    tcp: increase the default TCP scaling ratio




>
> >
> > >>
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c?h=v6.6.17#n385
> > >
> > > Hmm... rmnet_map_deaggregate() looks very strange.
> > >
> > > I also do not understand why this NIC driver uses gro_cells, which was
> > > designed for virtual drivers like tunnels.
> > >
> > > ca32fb034c19e00c changelog is sparse,
> > > it does not explain why standard GRO could not be directly used.
> > >
> > rmnet doesn't directly interface with HW. It is a virtual driver which
> > attaches over real hardware drivers like MHI (PCIe), QMI_WWAN (USB), IPA
> > to expose networking across different mobile APNs.
> >
> > As rmnet didn't have it's own NAPI struct, I added support for GRO using
> > cells. I tried disabling GRO and I don't see a difference in download
> > speeds or the receiver window either.
> >
> > >>
> > >>   From netif_receive_skb_entry tracing, I see that the truesize is around
> > >> ~2.5K for ~1.5K packets.
> > >
> > > This is a bit strange, this does not match :
> > >
> > >> ESTAB       4324072 0      192.0.0.2:42278                223.62.236.10:5215
> > >>        ino:129232 sk:3218 fwmark:0xc0078 <->
> > >>            skmem:(r4511016,
> > >
> > > -> 4324072 bytes of payload , using 4511016 bytes of memory
> > I probably need to dig into this further. If the memory usage here was
> > inline with the actual size to truesize ratio, would it cause the
> > receiver window to grow.
> >
> > Only explicitly increasing the window size to 16M in iperf3 matches the
> > download speed of B which suggests that sender server is unable to scale
> > the throughput for 4M case due to limited receiver window advertised by
> > A for the RTT in this specific configuration.
>
> What happens if you let autotuning enabled ?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ