[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60b04b0a-a50e-4d4a-a2bf-ea420f428b9c@quicinc.com>
Date: Wed, 15 May 2024 20:32:27 -0600
From: "Subash Abhinov Kasiviswanathan (KS)" <quic_subashab@...cinc.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <soheil@...gle.com>, <ncardwell@...gle.com>, <yyd@...gle.com>,
<ycheng@...gle.com>, <quic_stranche@...cinc.com>,
<davem@...emloft.net>, <kuba@...nel.org>, <netdev@...r.kernel.org>
Subject: Re: Potential impact of commit dfa2f0483360 ("tcp: get rid of
sysctl_tcp_adv_win_scale")
On 5/15/2024 1:10 AM, Eric Dumazet wrote:
> On Wed, May 15, 2024 at 6:47 AM Subash Abhinov Kasiviswanathan (KS)
> <quic_subashab@...cinc.com> wrote:
>>
>> We recently noticed that a device running a 6.6.17 kernel (A) was having
>> a slower single stream download speed compared to a device running
>> 6.1.57 kernel (B). The test here is over mobile radio with iperf3 with
>> window size 4M from a third party server.
>
> Hi Subash
>
> I think you gave many details, but please give us more of them :
Hi Eric
Thanks for getting back. Hope the information below is useful.
>
> 1) What driver is used on the receiver side.
rmnet
> 2) MTU
1372
> 3) cat /proc/sys/net/ipv4/tcp_rmem
4096 6291456 16777216
>
> Ideally, you could snapshot "ss -temoi dst <otherpeer>" on receive
> side while the transfer is ongoing,
> and possibly while stopping the receiver thread (kill -STOP `pidof iperf`)
>
192.0.0.2 is the device side address. I've listed the output of "ss
-temoi dst 223.62.236.10" mid transfer and one around the end of transfer.
I believe iperf3 makes a control connection prior to triggering the data
connection so it will list two flows. The transfer between
192.0.0.2:42278 <-> 223.62.236.10:5215 is the main data connection in
this case.
//mid transfer
State Recv-Q Send-Q Local Address:Port Peer
Address:Port
ESTAB 0 0 192.0.0.2:42278 223.62.236.10:5215
ino:129232 sk:3218 fwmark:0xc0078 <->
skmem:(r0,rb8388608,t0,tb8388608,f0,w0,o0,bl0,d1) ts sack
cubic wscale:7,6 rto:236 rtt:34.249/16.545 ato:40 mss:1320 rcvmss:1320
advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:38
bytes_received:211495680 segs_out:46198 segs_in:160290 data_segs_out:1
data_segs_in:160287 send 3.1Mbps lastsnd:3996 pacing_rate 6.2Mbps
delivery_rate 452.4Kbps app_limited busy:24ms rcv_rtt:26.542
rcv_space:3058440 minrtt:23.34
ESTAB 0 0 192.0.0.2:42270 223.62.236.10:5215
ino:128718 sk:4273 fwmark:0xc0078 <->
skmem:(r0,rb6291456,t0,tb2097152,f0,w0,o0,bl0,d0) ts sack
cubic wscale:10,9 rto:528 rtt:144.931/93.4 ato:40 mss:1320 rcvmss:536
advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:223 bytes_received:4
segs_out:9 segs_in:8 data_segs_out:3 data_segs_in:4 send 728.6Kbps
lastsnd:6064 lastrcv:3948 lastack:3948 pacing_rate 1.5Mbps delivery_rate
351.8Kbps app_limited busy:156ms rcv_space:13200 minrtt:30.021
//close to end of transfer
State Recv-Q Send-Q Local Address:Port Peer
Address:Port
ESTAB 4324072 0 192.0.0.2:42278 223.62.236.10:5215
ino:129232 sk:3218 fwmark:0xc0078 <->
skmem:(r4511016,rb8388608,t0,tb8388608,f2776,w0,o0,bl0,d1) ts
sack cubic wscale:7,6 rto:236 rtt:34.249/16.545 ato:40 mss:1320
rcvmss:1320 advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:38
bytes_received:608252040 segs_out:133117 segs_in:460963 data_segs_out:1
data_segs_in:460960 send 3.1Mbps lastsnd:10104 pacing_rate 6.2Mbps
delivery_rate 452.4Kbps app_limited busy:24ms rcv_rtt:25.111
rcv_space:3871560 minrtt:23.34
ESTAB 0 294 192.0.0.2:42270 223.62.236.10:5215
timer:(on,412ms,0) ino:128718 sk:4273 fwmark:0xc0078 <->
skmem:(r0,rb6291456,t0,tb2097152,f2010,w2086,o0,bl0,d0) ts
sack cubic wscale:10,9 rto:512 rtt:129.796/94.265 ato:40 mss:1320
rcvmss:536 advmss:1320 cwnd:10 ssthresh:1400 bytes_acked:224
bytes_received:5 segs_out:12 segs_in:9 data_segs_out:5 data_segs_in:5
send 813.6Kbps lastsnd:48 lastrcv:52 lastack:52 pacing_rate 1.6Mbps
delivery_rate 442.8Kbps app_limited busy:228ms unacked:1 rcv_space:13200
notsent:290 minrtt:23.848
> TCP is sensitive to the skb->len/skb->truesize ratio.
> Some drivers are known to provide 'bad skbs' in this regard.
>
> Commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") is
> simply a step for dynamic
> probing of skb->len/skb->truesize ratio, and give incentive for better
> memory use.
>
> Ultimately, TCP RWIN derives from effective memory usage.
>
> Sending a too big RWIN can cause excessive memory usage or packet drops.
> If you say RWIN was 6MB+ before the patch, this looks like a bug to me,
> because tcp_rmem[2] = 6MB by default. There is no way a driver can
> pack 6MB of TCP payload in 6MB of memory (no skb/headers overhead ???)
> This would only work well in lossless networks, and if receiving
> application drains TCP receive queue fast enough.
>
> Please take a look at these relevant patches.
> Note they are not perfect patches, because usbnet can still provide
> 'bad skbs', forcing TCP to send small RWIN.
rmnet is not updating the truesize directly in the receive path. There
is no cloning and there is an explicit copy of the data content to a
freshly allocated skb similar to your commits shared below.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c?h=v6.6.17#n385
From netif_receive_skb_entry tracing, I see that the truesize is around
~2.5K for ~1.5K packets.
>
> d50729f1d60bca822ef6d9c1a5fb28d486bd7593 net: usb: smsc95xx: stop
> lying about skb->truesize
> 05417aa9c0c038da2464a0c504b9d4f99814a23b net: usb: sr9700: stop lying
> about skb->truesize
> 1b3b2d9e772b99ea3d0f1f2252bf7a1c94b88be6 net: usb: smsc75xx: stop
> lying about skb->truesize
> 9aad6e45c4e7d16b2bb7c3794154b828fb4384b4 usb: aqc111: stop lying about
> skb->truesize
> 4ce62d5b2f7aecd4900e7d6115588ad7f9acccca net: usb: ax88179_178a: stop
> lying about skb->truesize
I reviewed many of the tcpdumps from other tests internally and I
consistently see the receiver window size scale to roughly half of what
is specified in iperf3 regardless of whatever radio configurations or
MTU. There was no download speed issue reported for any of these cases.
I believe this particular download test is failing as the RTT is likely
higher in this network than the other cases.
Powered by blists - more mailing lists