lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+QM1D=+fXQVeKv0vCO-+r0idGYBzmhKnj59Vp8FEhdxA@mail.gmail.com>
Date: Thu, 16 May 2024 07:36:26 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: "Subash Abhinov Kasiviswanathan (KS)" <quic_subashab@...cinc.com>
Cc: soheil@...gle.com, ncardwell@...gle.com, yyd@...gle.com, ycheng@...gle.com, 
	quic_stranche@...cinc.com, davem@...emloft.net, kuba@...nel.org, 
	netdev@...r.kernel.org
Subject: Re: Potential impact of commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")

On Thu, May 16, 2024 at 4:32 AM Subash Abhinov Kasiviswanathan (KS)
<quic_subashab@...cinc.com> wrote:
>
> On 5/15/2024 1:10 AM, Eric Dumazet wrote:
> > On Wed, May 15, 2024 at 6:47 AM Subash Abhinov Kasiviswanathan (KS)
> > <quic_subashab@...cinc.com> wrote:
> >>
> >> We recently noticed that a device running a 6.6.17 kernel (A) was having
> >> a slower single stream download speed compared to a device running
> >> 6.1.57 kernel (B). The test here is over mobile radio with iperf3 with
> >> window size 4M from a third party server.
> >
> > Hi Subash
> >
> > I think you gave many details, but please give us more of them :
>
> Hi Eric
>
> Thanks for getting back. Hope the information below is useful.
>
> >
> > 1) What driver is used on the receiver side.
> rmnet
>
> > 2) MTU
> 1372
>
> > 3) cat /proc/sys/net/ipv4/tcp_rmem
> 4096 6291456 16777216


DRS is historically sensitive to initial conditions.

tcp_rmem[1] seems too big here for DRS to kick smoothly.

I would use 0.5 MB perhaps, this will also also use less memory for
local (small rtt) connections

>
> >
> > Ideally, you could snapshot "ss -temoi dst <otherpeer>" on receive
> > side while the transfer is ongoing,
> > and possibly while stopping the receiver thread (kill -STOP `pidof iperf`)
> >
> 192.0.0.2 is the device side address. I've listed the output of "ss
> -temoi dst 223.62.236.10" mid transfer and one around the end of transfer.
>
> I believe iperf3 makes a control connection prior to triggering the data
> connection so it will list two flows.  The transfer between
> 192.0.0.2:42278 <-> 223.62.236.10:5215 is the main data connection in
> this case.
>
> //mid transfer
> State       Recv-Q Send-Q Local Address:Port                 Peer
> Address:Port
>
> ESTAB       0      0      192.0.0.2:42278                223.62.236.10:5215
>      ino:129232 sk:3218 fwmark:0xc0078 <->
>           skmem:(r0,rb8388608,t0,tb8388608,f0,w0,o0,bl0,d1) ts sack
> cubic wscale:7,6 rto:236 rtt:34.249/16.545 ato:40 mss:1320 rcvmss:1320
> advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:38
> bytes_received:211495680 segs_out:46198 segs_in:160290 data_segs_out:1
> data_segs_in:160287 send 3.1Mbps lastsnd:3996 pacing_rate 6.2Mbps
> delivery_rate 452.4Kbps app_limited busy:24ms rcv_rtt:26.542
> rcv_space:3058440 minrtt:23.34
> ESTAB       0      0      192.0.0.2:42270                223.62.236.10:5215
>      ino:128718 sk:4273 fwmark:0xc0078 <->
>           skmem:(r0,rb6291456,t0,tb2097152,f0,w0,o0,bl0,d0) ts sack
> cubic wscale:10,9 rto:528 rtt:144.931/93.4 ato:40 mss:1320 rcvmss:536
> advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:223 bytes_received:4
> segs_out:9 segs_in:8 data_segs_out:3 data_segs_in:4 send 728.6Kbps
> lastsnd:6064 lastrcv:3948 lastack:3948 pacing_rate 1.5Mbps delivery_rate
> 351.8Kbps app_limited busy:156ms rcv_space:13200 minrtt:30.021
>
> //close to end of transfer
> State       Recv-Q Send-Q Local Address:Port                 Peer
> Address:Port
>
> ESTAB       4324072 0      192.0.0.2:42278                223.62.236.10:5215
>       ino:129232 sk:3218 fwmark:0xc0078 <->
>           skmem:(r4511016,rb8388608,t0,tb8388608,f2776,w0,o0,bl0,d1) ts
> sack cubic wscale:7,6 rto:236 rtt:34.249/16.545 ato:40 mss:1320
> rcvmss:1320 advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:38
> bytes_received:608252040 segs_out:133117 segs_in:460963 data_segs_out:1
> data_segs_in:460960 send 3.1Mbps lastsnd:10104 pacing_rate 6.2Mbps
> delivery_rate 452.4Kbps app_limited busy:24ms rcv_rtt:25.111
> rcv_space:3871560 minrtt:23.34
> ESTAB       0      294    192.0.0.2:42270                223.62.236.10:5215
>      timer:(on,412ms,0) ino:128718 sk:4273 fwmark:0xc0078 <->
>           skmem:(r0,rb6291456,t0,tb2097152,f2010,w2086,o0,bl0,d0) ts
> sack cubic wscale:10,9 rto:512 rtt:129.796/94.265 ato:40 mss:1320
> rcvmss:536 advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:224
> bytes_received:5 segs_out:12 segs_in:9 data_segs_out:5 data_segs_in:5
> send 813.6Kbps lastsnd:48 lastrcv:52 lastack:52 pacing_rate 1.6Mbps
> delivery_rate 442.8Kbps app_limited busy:228ms unacked:1 rcv_space:13200
> notsent:290 minrtt:23.848
>
> > TCP is sensitive to the skb->len/skb->truesize ratio.
> > Some drivers are known to provide 'bad skbs' in this regard.
> >
> > Commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") is
> > simply a step for dynamic
> > probing of skb->len/skb->truesize ratio, and give incentive for better
> > memory use.
> >
> > Ultimately, TCP RWIN derives from effective memory usage.
> >
> > Sending a too big RWIN can cause excessive memory usage or packet drops.
> > If you say RWIN was 6MB+ before the patch, this looks like a bug to me,
> > because tcp_rmem[2] = 6MB by default. There is no way a driver can
> > pack 6MB of TCP payload in 6MB of memory (no skb/headers overhead ???)
> > This would only work well in lossless networks, and if receiving
> > application drains TCP receive queue fast enough.
> >
> > Please take a look at these relevant patches.
> > Note they are not perfect patches, because usbnet can still provide
> > 'bad skbs', forcing TCP to send small RWIN.
> rmnet is not updating the truesize directly in the receive path. There
> is no cloning and there is an explicit copy of the data content to a
> freshly allocated skb similar to your commits shared below.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c?h=v6.6.17#n385

Hmm... rmnet_map_deaggregate() looks very strange.

I also do not understand why this NIC driver uses gro_cells, which was
designed for virtual drivers like tunnels.

ca32fb034c19e00c changelog is sparse,
it does not explain why standard GRO could not be directly used.

>
>  From netif_receive_skb_entry tracing, I see that the truesize is around
> ~2.5K for ~1.5K packets.

This is a bit strange, this does not match :

> ESTAB       4324072 0      192.0.0.2:42278                223.62.236.10:5215
>       ino:129232 sk:3218 fwmark:0xc0078 <->
>           skmem:(r4511016,

-> 4324072 bytes of payload , using 4511016 bytes of memory



>
> >
> > d50729f1d60bca822ef6d9c1a5fb28d486bd7593 net: usb: smsc95xx: stop
> > lying about skb->truesize
> > 05417aa9c0c038da2464a0c504b9d4f99814a23b net: usb: sr9700: stop lying
> > about skb->truesize
> > 1b3b2d9e772b99ea3d0f1f2252bf7a1c94b88be6 net: usb: smsc75xx: stop
> > lying about skb->truesize
> > 9aad6e45c4e7d16b2bb7c3794154b828fb4384b4 usb: aqc111: stop lying about
> > skb->truesize
> > 4ce62d5b2f7aecd4900e7d6115588ad7f9acccca net: usb: ax88179_178a: stop
> > lying about skb->truesize
>
> I reviewed many of the tcpdumps from other tests internally and I
> consistently see the receiver window size scale to roughly half of what
> is specified in iperf3 regardless of whatever radio configurations or
> MTU. There was no download speed issue reported for any of these cases.
>
> I believe this particular download test is failing as the RTT is likely
> higher in this network than the other cases.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ