lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 15 Sep 2023 21:20:20 -0600
From: David Ahern <dsahern@...nel.org>
To: Coco Li <lixiaoyan@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Eric Dumazet <edumazet@...gle.com>, Neal Cardwell <ncardwell@...gle.com>,
 Mubashir Adnan Qureshi <mubashirq@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Chao Wu <wwchao@...gle.com>,
 Wei Wang <weiwan@...gle.com>
Subject: Re: [PATCH v1 net-next 0/5] Analyze and Reorganize core Networking
 Structs to optimize cacheline consumption

On 9/15/23 7:06 PM, Coco Li wrote:
> Currently, variable-heavy structs in the networking stack is organized
> chronologically, logically and sometimes by cache line access.
> 
> This patch series attempts to reorganize the core networking stack
> variables to minimize cacheline consumption during the phase of data
> transfer. Specifically, we looked at the TCP/IP stack and the fast
> path definition in TCP.
> 
> For documentation purposes, we also added new files for each core data
> structure we considered, although not all ended up being modified due
> to the amount of existing cache line they span in the fast path. In 
> the documentation, we recorded all variables we identified on the
> fast path and the reasons. We also hope that in the future when
> variables are added/modified, the document can be referred to and
> updated accordingly to reflect the latest variable organization.
> 
> Tested:
> Our tests were run with neper tcp_rr using tcp traffic. The tests have $cpu
> number of threads and variable number of flows (see below).
> 
> Tests were run on 6.5-rc1
> 
> Efficiency is computed as cpu seconds / throughput (one tcp_rr round trip).
> The following result shows Efficiency delta before and after the patch
> series is applied.
> 
> On AMD platforms with 100Gb/s NIC and 256Mb L3 cache:
> IPv4
> Flows	with patches	clean kernel	  Percent reduction
> 30k	0.0001736538065	0.0002741191042	-36.65%
> 20k	0.0001583661752	0.0002712559158	-41.62%
> 10k	0.0001639148817	0.0002951800751	-44.47%
> 5k	0.0001859683866	0.0003320642536	-44.00%
> 1k	0.0002035190546	0.0003152056382	-35.43%
> 
> IPv6
> Flows	with patches  clean kernel    Percent reduction
> 30k	0.000202535503	0.0003275329163 -38.16%
> 20k	0.0002020654777	0.0003411304786 -40.77%
> 10k	0.0002122427035	0.0003803674705 -44.20%
> 5k	0.0002348776729	0.0004030403953 -41.72%
> 1k	0.0002237384583	0.0002813646157 -20.48%
> 
> On Intel platforms with 200Gb/s NIC and 105Mb L3 cache:
> IPv6
> Flows	with patches	clean kernel	Percent reduction
> 30k	0.0006296537873	0.0006370427753	-1.16%
> 20k	0.0003451029365	0.0003628016076	-4.88%
> 10k	0.0003187646958	0.0003346835645	-4.76%
> 5k	0.0002954676348	0.000311807592	-5.24%
> 1k	0.0001909169342	0.0001848069709	3.31%
> 

This is awesome. How much of the work leveraged tools vs manually going
through code to do the reorganization of the structs? e.g., was the perf
c2c of use?


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ