[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQyngH=CW_RLXQHiLbii6-zDeCKYP05zP=cta95KVoaF4ng@mail.gmail.com>
Date: Wed, 9 Mar 2022 11:25:28 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>, Kevin Yang <yyd@...gle.com>
Subject: Re: [PATCH net-next] tcp: adjust TSO packet sizes based on min_rtt
On Tue, Mar 8, 2022 at 8:58 PM Eric Dumazet <eric.dumazet@...il.com> wrote:
>
> From: Eric Dumazet <edumazet@...gle.com>
>
> Back when tcp_tso_autosize() and TCP pacing were introduced,
> our focus was really to reduce burst sizes for long distance
> flows.
>
> The simple heuristic of using sk_pacing_rate/1024 has worked
> well, but can lead to too small packets for hosts in the same
> rack/cluster, when thousands of flows compete for the bottleneck.
>
> Neal Cardwell had the idea of making the TSO burst size
> a function of both sk_pacing_rate and tcp_min_rtt()
>
> Indeed, for local flows, sending bigger bursts is better
> to reduce cpu costs, as occasional losses can be repaired
> quite fast.
>
> This patch is based on Neal Cardwell implementation
> done more than two years ago.
> bbr is adjusting max_pacing_rate based on measured bandwidth,
> while cubic would over estimate max_pacing_rate.
>
> /proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
> this new feature, in logarithmic steps.
>
> Tested:
>
> 100Gbit NIC, two hosts in the same rack, 4K MTU.
> 600 flows rate-limited to 20000000 bytes per second.
>
> Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)
>
> ~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
> ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
> 96005
>
> Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
>
> 65,945.29 msec task-clock # 2.845 CPUs utilized
> 1,314,632 context-switches # 19935.279 M/sec
> 5,292 cpu-migrations # 80.249 M/sec
> 940,641 page-faults # 14264.023 M/sec
> 201,117,030,926 cycles # 3049769.216 GHz (83.45%)
> 17,699,435,405 stalled-cycles-frontend # 8.80% frontend cycles idle (83.48%)
> 136,584,015,071 stalled-cycles-backend # 67.91% backend cycles idle (83.44%)
> 53,809,530,436 instructions # 0.27 insn per cycle
> # 2.54 stalled cycles per insn (83.36%)
> 9,062,315,523 branches # 137422329.563 M/sec (83.22%)
> 153,008,621 branch-misses # 1.69% of all branches (83.32%)
>
> 23.182970846 seconds time elapsed
>
> TcpInSegs 15648792 0.0
> TcpOutSegs 58659110 0.0 # Average of 3.7 4K segments per TSO packet
> TcpExtTCPDelivered 58654791 0.0
> TcpExtTCPDeliveredCE 19 0.0
>
> After patch:
>
> ~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
> ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
> 96046
>
> Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
>
> 48,982.58 msec task-clock # 2.104 CPUs utilized
> 186,014 context-switches # 3797.599 M/sec
> 3,109 cpu-migrations # 63.472 M/sec
> 941,180 page-faults # 19214.814 M/sec
> 153,459,763,868 cycles # 3132982.807 GHz (83.56%)
> 12,069,861,356 stalled-cycles-frontend # 7.87% frontend cycles idle (83.32%)
> 120,485,917,953 stalled-cycles-backend # 78.51% backend cycles idle (83.24%)
> 36,803,672,106 instructions # 0.24 insn per cycle
> # 3.27 stalled cycles per insn (83.18%)
> 5,947,266,275 branches # 121417383.427 M/sec (83.64%)
> 87,984,616 branch-misses # 1.48% of all branches (83.43%)
>
> 23.281200256 seconds time elapsed
>
> TcpInSegs 1434706 0.0
> TcpOutSegs 58883378 0.0 # Average of 41 4K segments per TSO packet
> TcpExtTCPDelivered 58878971 0.0
> TcpExtTCPDeliveredCE 9664 0.0
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
Thanks, Eric!
Reviewed-by: Neal Cardwell <ncardwell@...gle.com>
neal
Powered by blists - more mailing lists