linux-kernel - Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4bb4c12b-4f8d-1069-6fe8-fe256292a763@amd.com>
Date:   Tue, 11 Jul 2023 08:32:27 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Sandeep Dhavale <dhavale@...gle.com>, jiangshanlai@...il.com,
        torvalds@...ux-foundation.org, peterz@...radead.org,
        linux-kernel@...r.kernel.org, kernel-team@...a.com,
        joshdon@...gle.com, brho@...gle.com, briannorris@...omium.org,
        nhuck@...gle.com, agk@...hat.com, snitzer@...nel.org,
        void@...ifault.com, kernel-team@...roid.com
Subject: Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

Hello Tejun,

On 7/6/2023 12:09 AM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jul 05, 2023 at 12:34:48PM +0530, K Prateek Nayak wrote:
>> - Apart from tbench and netperf, the rest of the benchmarks show no
>>   difference out of the box.
> 
> Just looking at the data, it's a bit difficult for me to judge. I suppose
> most of differences are due to run-to-run variances? It'd be really useful
> if the data contains standard deviation (whether historical or directly from
> multiple runs).

I'll make sure to include this from now on.

> 
>> - SPECjbb2015 Multi-jVM sees small uplift to max-jOPS with certain
>>   affinity scopes.
>>
>> - tbench and netperf seem to be unhappy throughout. None of the affinity
>>   scopes seem to bring back the performance. I'll dig more into this.
> 
> Yeah, that seems pretty consistent.
> 
>> ~~~~~~~~~~
>> ~ stream ~
>> ~~~~~~~~~~
>>
>> o NPS1
>>
>> - 10 Runs:
>>
>> Test:		base		   affinity_scopes
>>  Copy:	 245676.59 (0.00 pct)	 333646.71 (35.80 pct)
>> Scale:	 206545.41 (0.00 pct)	 205706.04 (-0.40 pct)
>>   Add:	 213506.82 (0.00 pct)	 236739.07 (10.88 pct)
>> Triad:	 217679.43 (0.00 pct)	 249263.46 (14.50 pct)
>>
>> - 100 Runs:
>>
>> Test:		base		   affinity_scopes
>>  Copy:	 318060.91 (0.00 pct)	 326025.89 (2.50 pct)
>> Scale:	 213943.40 (0.00 pct)	 207647.37 (-2.94 pct)
>>   Add:	 237892.53 (0.00 pct)	 232164.59 (-2.40 pct)
>> Triad:	 245672.84 (0.00 pct)	 246333.21 (0.26 pct)
>>
>> o NPS2
>>
>> - 10 Runs:
>>
>> Test:		base		   affinity_scopes
>>  Copy:	 296632.20 (0.00 pct)	 291153.63 (-1.84 pct)
>> Scale:	 206193.90 (0.00 pct)	 216368.42 (4.93 pct)
>>   Add:	 240363.50 (0.00 pct)	 245954.23 (2.32 pct)
>> Triad:	 242748.60 (0.00 pct)	 238606.20 (-1.70 pct)
>>
>> - 100 Runs:
>>
>> Test:		base		   affinity_scopes
>>  Copy:	 322535.79 (0.00 pct)	 315020.03 (-2.33 pct)
>> Scale:	 217723.56 (0.00 pct)	 220172.32 (1.12 pct)
>>   Add:	 248117.72 (0.00 pct)	 250557.17 (0.98 pct)
>> Triad:	 257768.66 (0.00 pct)	 248264.00 (-3.68 pct)
>>
>> o NPS4
>>
>> - 10 Runs:
>>
>> Test:		base		   affinity_scopes
>>  Copy:	 274067.54 (0.00 pct)	 302804.77 (10.48 pct)
>> Scale:	 224944.53 (0.00 pct)	 230112.39 (2.29 pct)
>>   Add:	 229318.09 (0.00 pct)	 241939.54 (5.50 pct)
>> Triad:	 230175.89 (0.00 pct)	 253613.85 (10.18 pct)
>>
>> - 100 Runs:
>>
>> Test:		base		   affinity_scopes
>>  Copy:	 338922.96 (0.00 pct)	 348183.65 (2.73 pct)
>> Scale:	 240262.45 (0.00 pct)	 245939.67 (2.36 pct)
>>   Add:	 256968.24 (0.00 pct)	 260657.01 (1.43 pct)
>> Triad:	 262644.16 (0.00 pct)	 262286.46 (-0.13 pct)
> 
> The differences seem more consistent and pronounced for this benchmark too.
> Is this just expected variance for this benchmark?

Stream's changes are mostly due to run-to-run variance.

> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ~ Benchmarks run with multiple affinity scope ~
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> o NPS1
>>
>> - tbench
>>
>> Clients:     base                     cpu                    cache                   numa                    system
>>     1    450.40 (0.00 pct)       459.44 (2.00 pct)       457.12 (1.49 pct)       456.36 (1.32 pct)       456.75 (1.40 pct)
>>     2    872.50 (0.00 pct)       869.68 (-0.32 pct)      890.59 (2.07 pct)       878.87 (0.73 pct)       890.14 (2.02 pct)
>>     4    1630.13 (0.00 pct)      1621.24 (-0.54 pct)     1634.74 (0.28 pct)      1628.62 (-0.09 pct)     1646.57 (1.00 pct)
>>     8    3139.90 (0.00 pct)      3044.58 (-3.03 pct)     3099.49 (-1.28 pct)     3081.43 (-1.86 pct)     3151.16 (0.35 pct)
>>    16    6113.51 (0.00 pct)      5555.17 (-9.13 pct)     5465.09 (-10.60 pct)    5661.31 (-7.39 pct)     5742.58 (-6.06 pct)
>>    32    11024.64 (0.00 pct)     9574.62 (-13.15 pct)    9282.62 (-15.80 pct)    9542.00 (-13.44 pct)    9916.66 (-10.05 pct)
>>    64    19081.96 (0.00 pct)     15656.53 (-17.95 pct)   15176.12 (-20.46 pct)   16527.77 (-13.38 pct)   15097.97 (-20.87 pct)
>>   128    30956.07 (0.00 pct)     28277.80 (-8.65 pct)    27662.76 (-10.63 pct)   27817.94 (-10.13 pct)   28925.78 (-6.55 pct)
>>   256    42829.46 (0.00 pct)     38646.48 (-9.76 pct)    38355.27 (-10.44 pct)   37073.24 (-13.43 pct)   34391.01 (-19.70 pct)
>>   512    42395.69 (0.00 pct)     36931.34 (-12.88 pct)   39259.49 (-7.39 pct)    36571.62 (-13.73 pct)   36245.55 (-14.50 pct)
>>  1024    41973.51 (0.00 pct)     38817.07 (-7.52 pct)    38733.15 (-7.72 pct)    38864.45 (-7.40 pct)    35728.70 (-14.87 pct)
>>
>> - netperf
>>
>>                         base                    cpu                     cache                   numa                    system
>>  1-clients:      100910.82 (0.00 pct)    103440.72 (2.50 pct)    102592.36 (1.66 pct)    103199.49 (2.26 pct)    103561.90 (2.62 pct)
>>  2-clients:      99777.76 (0.00 pct)     100414.00 (0.63 pct)    100305.89 (0.52 pct)    99890.90 (0.11 pct)     101512.46 (1.73 pct)
>>  4-clients:      97676.17 (0.00 pct)     96624.28 (-1.07 pct)    95966.77 (-1.75 pct)    97105.22 (-0.58 pct)    97972.11 (0.30 pct)
>>  8-clients:      95413.11 (0.00 pct)     89926.72 (-5.75 pct)    89977.14 (-5.69 pct)    91020.10 (-4.60 pct)    92458.94 (-3.09 pct)
>> 16-clients:      88961.66 (0.00 pct)     81295.02 (-8.61 pct)    79144.83 (-11.03 pct)   80216.42 (-9.83 pct)    85439.68 (-3.95 pct)
>> 32-clients:      82199.83 (0.00 pct)     77914.00 (-5.21 pct)    75055.66 (-8.69 pct)    76813.94 (-6.55 pct)    80768.87 (-1.74 pct)
>> 64-clients:      66094.87 (0.00 pct)     64419.91 (-2.53 pct)    63718.37 (-3.59 pct)    60370.40 (-8.66 pct)    66179.58 (0.12 pct)
>> 128-clients:     43833.63 (0.00 pct)     42936.08 (-2.04 pct)    44554.69 (1.64 pct)     42666.82 (-2.66 pct)    45543.69 (3.90 pct)
>> 256-clients:     38917.58 (0.00 pct)     24807.28 (-36.25 pct)   20517.01 (-47.28 pct)   21651.40 (-44.36 pct)   23778.87 (-38.89 pct)
>>
>> - SPECjbb2015 Mutli-JVM
>>
>> 	       max-jOPS	     critical-jOPS
>> base:		 0.00%		 0.00%
>> smt:            -1.11%		-1.84%
>> cpu:             2.86%		-1.35%
>> cache:           2.86%		-1.66%
>> numa:            1.43%		-1.49%
>> system:          0.08%		-0.41%
>>
>>
>> I'll go dig deeper into the tbench and netperf regressions. I'm not sure
>> why the regression is observed for all the affinity scopes. I'll look
>> into IBS profile and see if something obvious pops up. Meanwhile if there
>> is any specific data you would like me to collect or benchmark you would
>> like me to test, let me know.
> 
> Yeah, that's a bit surprising given that in terms of affinity behavior
> "numa" should be identical to base. The only meaningful differences that I
> can think of is when the work item is assigned to its worker and maybe how
> pwq max_active limit is applied. Hmm... can you monitor the number of
> kworker kthreads while running the benchmark? No need to do the whole
> matrix, just comparing base against numa should be enough.

Sure. I'll get back to you with the data soon.

> 
> Thanks.
> 

--
Thanks and Regards,
Prateek