lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 8 Mar 2022 11:53:38 -0800 From: Eric Dumazet <edumazet@...gle.com> To: David Laight <David.Laight@...lab.com> Cc: Jakub Kicinski <kuba@...nel.org>, netdev <netdev@...r.kernel.org>, Willem de Bruijn <willemb@...gle.com>, Neal Cardwell <ncardwell@...gle.com>, Yuchung Cheng <ycheng@...gle.com> Subject: Re: [RFC net-next] tcp: allow larger TSO to be built under overload On Tue, Mar 8, 2022 at 1:08 AM David Laight <David.Laight@...lab.com> wrote: > > From: Eric Dumazet > > Sent: 08 March 2022 03:50 > ... > > /* Goal is to send at least one packet per ms, > > * not one big TSO packet every 100 ms. > > * This preserves ACK clocking and is consistent > > * with tcp_tso_should_defer() heuristic. > > */ > > - segs = max_t(u32, bytes / mss_now, min_tso_segs); > > - > > - return segs; > > + return max_t(u32, bytes / mss_now, min_tso_segs); > > } > > Which is the common side of that max_t() ? > If it is mon_tso_segs it might be worth avoiding the > divide by coding as: > > return bytes > mss_now * min_tso_segs ? bytes / mss_now : min_tso_segs; > I think the common case is when the divide must happen. Not sure if this really matters with current cpus. Jakub, Neal, I am going to send a patch for net-next. In conjunction with BIG TCP, this gives a considerable boost of performance. Before: otrv5:/home/google/edumazet# nstat -n;./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96005 TcpInSegs 15649381 0.0 TcpOutSegs 58659574 0.0 # Average of 3.74 4K segments per TSO packet TcpExtTCPDelivered 58655240 0.0 TcpExtTCPDeliveredCE 21 0.0 After: otrv5:/home/google/edumazet# nstat -n;./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96046 TcpInSegs 1445864 0.0 TcpOutSegs 58885065 0.0 # Average of 40.72 4K segments per TSO packet TcpExtTCPDelivered 58880873 0.0 TcpExtTCPDeliveredCE 28 0.0 -> 1,445,864 ACK packets instead of 15,649,381 And about 25 % of cpu cycles saved, according to perf stat Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 66,895.00 msec task-clock # 2.886 CPUs utilized 1,312,687 context-switches # 19623.389 M/sec 5,645 cpu-migrations # 84.387 M/sec 942,412 page-faults # 14088.139 M/sec 203,672,224,410 cycles # 3044700.936 GHz (83.40%) 18,933,350,691 stalled-cycles-frontend # 9.30% frontend cycles idle (83.46%) 138,500,001,318 stalled-cycles-backend # 68.00% backend cycles idle (83.38%) 53,694,300,814 instructions # 0.26 insn per cycle # 2.58 stalled cycles per insn (83.30%) 9,100,155,390 branches # 136038439.770 M/sec (83.26%) 152,331,123 branch-misses # 1.67% of all branches (83.47%) 23.180309488 seconds time elapsed --> Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 48,964.30 msec task-clock # 2.103 CPUs utilized 184,903 context-switches # 3776.305 M/sec 3,057 cpu-migrations # 62.434 M/sec 940,615 page-faults # 19210.338 M/sec 152,390,738,065 cycles # 3112301.652 GHz (83.61%) 11,603,675,527 stalled-cycles-frontend # 7.61% frontend cycles idle (83.49%) 120,240,493,440 stalled-cycles-backend # 78.90% backend cycles idle (83.30%) 37,106,498,492 instructions # 0.24 insn per cycle # 3.24 stalled cycles per insn (83.47%) 5,968,256,846 branches # 121890712.483 M/sec (83.25%) 88,743,145 branch-misses # 1.49% of all branches (83.24%) 23.284583305 seconds time elapsed
Powered by blists - more mailing lists