[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZyU0hkjeIteoThQ7@swahl-home.5wahls.com>
Date: Fri, 1 Nov 2024 15:05:26 -0500
From: Steve Wahl <steve.wahl@....com>
To: samir <samir@...ux.ibm.com>
Cc: Steve Wahl <steve.wahl@....com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, Russ Anderson <rja@....com>,
Dimitri Sivanich <sivanich@....com>, vishalc@...ux.ibm.com,
sshegde@...ux.ibm.com, srikar@...ux.ibm.com
Subject: Re: [PATCH] sched/topology: improve topology_span_sane speed
On Tue, Oct 29, 2024 at 11:04:52PM +0530, samir wrote:
>
> I have verified this patch on PowerPC and below are the results for "time
> ppc64_cpu —smt =off/4" mode,
> Here are the 5 iteration data for “time ppc64_cpu --smt=off/4” command(min,
> max, Average, and Std Dev).
>
> ——————Without patch——————
> ————uname -a————
> 6.12.0-rc5
>
> ————lscpu————
> lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 360
> On-line CPU(s) list: 0-359
> NUMA:
> NUMA node(s): 4
> NUMA node0 CPU(s): 0-95
> NUMA node1 CPU(s): 96-191
> NUMA node2 CPU(s): 192-271
> NUMA node3 CPU(s): 272-359
>
> Without Patch:
> Metric SMT Off (s) SMT 4 (s)
> Min 68.63 37.64
> Max 74.92 39.39
> Average 70.92 38.48
> Std Dev 2.22 0.63
>
>
> ——————With patch——————
> ————uname -a————
> 6.12.0-rc5-dirty
>
> ————lscpu————
> lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 360
> On-line CPU(s) list: 0-359
> NUMA:
> NUMA node(s): 4
> NUMA node0 CPU(s): 0-95
> NUMA node1 CPU(s): 96-191
> NUMA node2 CPU(s): 192-271
> NUMA node3 CPU(s): 272-359
>
> With Patch:
> Metric SMT Off (s) SMT 4 (s)
> Min 68.748 33.442
> Max 72.954 38.042
> Average 70.309 36.206
> Std Dev 1.41 1.66
>
> From the results it’s seen that there is no significant improvement,
> however, with the patch applied, the SMT=4 state shows a decrease in both
> average time, as reflected in the lower average (36.21s vs. 38.48s) and
> higher standard deviation (1.66s vs. 0.63s) compared to the previous without
> patch apply result.
Samir,
I found your results interesting. So I tried to compare with our
systems, and I get similar results. Around 300 processors, this patch
makes little difference. At higher counts, the topology_span_sane
function change has more influence.
I don't have PPC system access, but I tried to recreate similar
results on our x86_64 systems. I took a 8 socket, 60 core/socket, 2
thread/core system (960 CPUs), and limited it to 20 physical
cores/socket (320 CPUs) for comparison.
I'm using scripts from Intel's System Health Check,
"Set-Half-Of-The-Cores-Offline.sh" and "Set-All-Cores-Online.sh", but
similar results could be obtained with anything that manipulates
/sys/devices/system/cpu/cpu*/online.
I also found that the first offlining attempt after a reboot goes much
faster, so I threw out the first result after reboot and then measured
5 iterations. (The reason for this probably needs exploration, but it
happens for me on both patched and unpatched versions.)
All times in seconds.
With 20 cores / socket (320 CPUs counting hyperthreads):
Without patch:
Half-Offline All-Online
min 21.47 30.76
max 22.35 31.31
avg 22.04 31.124
std.dev. 0.3419795 0.2175545
With patch:
Half-Offline All-Online
min 20.43 28.23
max 21.93 29.76
avg 20.786 28.874
std.dev. 0.6435293 0.6366553
Not a huge difference at this level.
At 60 cores / socket (960 CPUs counting hyperthreads):
Without patch:
Half-Offline All-Online
min 275.34 321.47
max 288.05 331.89
avg 282.964 326.884
std.dev. 5.8835813 4.0268945
With patch:
Half-Offline All-Online
min 208.9 247.17
max 219.49 251.48
avg 212.392 249.394
std.dev. 4.1717586 1.6904526
Here it starts to make a difference, and as the number of CPUs goes
up, it gets worse.
I should note that I made my measurements with v2 of the patch,
recently posted. Version 2 does remove a memory allocation, which
might have improved things.
Thanks,
--> Steve Wahl
--
Steve Wahl, Hewlett Packard Enterprise
Powered by blists - more mailing lists