netdev - Re: [PATCH v1 net-next 0/5] Analyze and Reorganize core Networking Structs to optimize cacheline consumption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a0f77ac5-369b-adb9-506c-429ec4e3fc86@kernel.org>
Date: Fri, 15 Sep 2023 21:20:20 -0600
From: David Ahern <dsahern@...nel.org>
To: Coco Li <lixiaoyan@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Eric Dumazet <edumazet@...gle.com>, Neal Cardwell <ncardwell@...gle.com>,
 Mubashir Adnan Qureshi <mubashirq@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Chao Wu <wwchao@...gle.com>,
 Wei Wang <weiwan@...gle.com>
Subject: Re: [PATCH v1 net-next 0/5] Analyze and Reorganize core Networking
 Structs to optimize cacheline consumption

On 9/15/23 7:06 PM, Coco Li wrote:
> Currently, variable-heavy structs in the networking stack is organized
> chronologically, logically and sometimes by cache line access.
> 
> This patch series attempts to reorganize the core networking stack
> variables to minimize cacheline consumption during the phase of data
> transfer. Specifically, we looked at the TCP/IP stack and the fast
> path definition in TCP.
> 
> For documentation purposes, we also added new files for each core data
> structure we considered, although not all ended up being modified due
> to the amount of existing cache line they span in the fast path. In 
> the documentation, we recorded all variables we identified on the
> fast path and the reasons. We also hope that in the future when
> variables are added/modified, the document can be referred to and
> updated accordingly to reflect the latest variable organization.
> 
> Tested:
> Our tests were run with neper tcp_rr using tcp traffic. The tests have $cpu
> number of threads and variable number of flows (see below).
> 
> Tests were run on 6.5-rc1
> 
> Efficiency is computed as cpu seconds / throughput (one tcp_rr round trip).
> The following result shows Efficiency delta before and after the patch
> series is applied.
> 
> On AMD platforms with 100Gb/s NIC and 256Mb L3 cache:
> IPv4
> Flows	with patches	clean kernel	  Percent reduction
> 30k	0.0001736538065	0.0002741191042	-36.65%
> 20k	0.0001583661752	0.0002712559158	-41.62%
> 10k	0.0001639148817	0.0002951800751	-44.47%
> 5k	0.0001859683866	0.0003320642536	-44.00%
> 1k	0.0002035190546	0.0003152056382	-35.43%
> 
> IPv6
> Flows	with patches  clean kernel    Percent reduction
> 30k	0.000202535503	0.0003275329163 -38.16%
> 20k	0.0002020654777	0.0003411304786 -40.77%
> 10k	0.0002122427035	0.0003803674705 -44.20%
> 5k	0.0002348776729	0.0004030403953 -41.72%
> 1k	0.0002237384583	0.0002813646157 -20.48%
> 
> On Intel platforms with 200Gb/s NIC and 105Mb L3 cache:
> IPv6
> Flows	with patches	clean kernel	Percent reduction
> 30k	0.0006296537873	0.0006370427753	-1.16%
> 20k	0.0003451029365	0.0003628016076	-4.88%
> 10k	0.0003187646958	0.0003346835645	-4.76%
> 5k	0.0002954676348	0.000311807592	-5.24%
> 1k	0.0001909169342	0.0001848069709	3.31%
> 

This is awesome. How much of the work leveraged tools vs manually going
through code to do the reorganization of the structs? e.g., was the perf
c2c of use?