linux-kernel - Re: [PATCH] sched/topology: improve topology_span

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZyU0hkjeIteoThQ7@swahl-home.5wahls.com>
Date: Fri, 1 Nov 2024 15:05:26 -0500
From: Steve Wahl <steve.wahl@....com>
To: samir <samir@...ux.ibm.com>
Cc: Steve Wahl <steve.wahl@....com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org, Russ Anderson <rja@....com>,
        Dimitri Sivanich <sivanich@....com>, vishalc@...ux.ibm.com,
        sshegde@...ux.ibm.com, srikar@...ux.ibm.com
Subject: Re: [PATCH] sched/topology: improve topology_span_sane speed

On Tue, Oct 29, 2024 at 11:04:52PM +0530, samir wrote:
> 
> I have verified this patch on PowerPC and below are the results for "time
> ppc64_cpu —smt =off/4" mode,
> Here are the 5 iteration data for “time ppc64_cpu --smt=off/4” command(min,
> max, Average, and Std Dev).
> 
> ——————Without patch——————
> ————uname -a————
> 6.12.0-rc5
> 
> ————lscpu————
> lscpu
> Architecture:             ppc64le
>   Byte Order:             Little Endian
> CPU(s):                   360
>   On-line CPU(s) list:    0-359
> NUMA:
>   NUMA node(s):           4
>   NUMA node0 CPU(s):      0-95
>   NUMA node1 CPU(s):      96-191
>   NUMA node2 CPU(s):      192-271
>   NUMA node3 CPU(s):      272-359
> 
> Without Patch:
> Metric	       SMT Off (s)   SMT 4 (s)
> Min	      	  68.63	      37.64
> Max	      	  74.92	      39.39
> Average	   	  70.92	      38.48
> Std Dev	    	  2.22	      0.63
> 
> 
> ——————With patch——————
> ————uname -a————
> 6.12.0-rc5-dirty
> 
> ————lscpu————
> lscpu
> Architecture:             ppc64le
>   Byte Order:             Little Endian
> CPU(s):                   360
>   On-line CPU(s) list:    0-359
> NUMA:
>   NUMA node(s):           4
>   NUMA node0 CPU(s):      0-95
>   NUMA node1 CPU(s):      96-191
>   NUMA node2 CPU(s):      192-271
>   NUMA node3 CPU(s):      272-359
> 
> With Patch:
> Metric	    SMT Off (s)	    SMT 4 (s)
> Min 	        68.748	     33.442
> Max	        72.954	     38.042
> Average	        70.309	     36.206
> Std Dev	        1.41	     1.66
> 
> From the results it’s seen that there is no significant improvement,
> however, with the patch applied, the SMT=4 state shows a decrease in both
> average time, as reflected in the lower average (36.21s vs. 38.48s) and
> higher standard deviation (1.66s vs. 0.63s) compared to the previous without
> patch apply result.

Samir,

I found your results interesting.  So I tried to compare with our
systems, and I get similar results.  Around 300 processors, this patch
makes little difference.  At higher counts, the topology_span_sane
function change has more influence.

I don't have PPC system access, but I tried to recreate similar
results on our x86_64 systems.  I took a 8 socket, 60 core/socket, 2
thread/core system (960 CPUs), and limited it to 20 physical
cores/socket (320 CPUs) for comparison.

I'm using scripts from Intel's System Health Check,
"Set-Half-Of-The-Cores-Offline.sh" and "Set-All-Cores-Online.sh", but
similar results could be obtained with anything that manipulates
/sys/devices/system/cpu/cpu*/online.

I also found that the first offlining attempt after a reboot goes much
faster, so I threw out the first result after reboot and then measured
5 iterations.  (The reason for this probably needs exploration, but it
happens for me on both patched and unpatched versions.)

All times in seconds.  

With 20 cores / socket (320 CPUs counting hyperthreads):

Without patch:
		Half-Offline	All-Online
min		21.47		30.76
max		22.35		31.31
avg		22.04		31.124
std.dev.	0.3419795	0.2175545

With patch:
		Half-Offline	All-Online
min		20.43		28.23
max		21.93		29.76
avg		20.786		28.874
std.dev.	0.6435293	0.6366553

Not a huge difference at this level.

At 60 cores / socket (960 CPUs counting hyperthreads):

Without patch:
                Half-Offline    All-Online
min		275.34		321.47
max		288.05		331.89
avg		282.964		326.884
std.dev.	5.8835813	4.0268945

With patch:
                Half-Offline    All-Online
min		208.9		247.17
max		219.49		251.48
avg		212.392		249.394
std.dev.	4.1717586	1.6904526

Here it starts to make a difference, and as the number of CPUs goes
up, it gets worse.

I should note that I made my measurements with v2 of the patch,
recently posted.  Version 2 does remove a memory allocation, which
might have improved things.

Thanks,

--> Steve Wahl

-- 
Steve Wahl, Hewlett Packard Enterprise