[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9191110a-daf9-0520-a47a-801fa3f744d8@amd.com>
Date: Thu, 17 Mar 2022 16:09:29 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: linux-kernel@...r.kernel.org, Tim Chen <tim.c.chen@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Mel Gorman <mgorman@...e.de>,
Viresh Kumar <viresh.kumar@...aro.org>,
Barry Song <21cnbao@...il.com>,
Barry Song <song.bao.hua@...ilicon.com>,
Yicong Yang <yangyicong@...ilicon.com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Len Brown <len.brown@...el.com>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Aubrey Li <aubrey.li@...el.com>
Subject: Re: [PATCH v2][RFC] sched/fair: Change SIS_PROP to search idle CPU
based on sum of util_avg
Hello Chenyu,
Thank you for looking into the results.
On 3/16/2022 5:24 PM, Chen Yu wrote:
> [..snip..]
> Just wonder what the kernel version was when you tested v1?
> https://lore.kernel.org/lkml/4ca9ba48-20f0-84d5-6a38-11f9d4c7a028@amd.com/
> It seems that there is slight performance difference between the old baseline
> and current 5.17-rc5 tip sched/core.
I'll make a point to include the HEAD commit from next time onward to
remove this ambiguity.
- While testing v1, the sched-tip was at:
commit: 3624ba7b5e2a ("sched/numa-balancing: Move some document to make it consistent with the code")
- While testing v2, the sched-tip was at:
commit: a0a7e453b502 ("sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers")
>> [..snip..]
>>
>> ~~~~~~~~
>> schbench
>> ~~~~~~~~
>>
>> NPS 1
>>
>> #workers: sched-tip v2_sis_prop
>> 1: 13.00 (0.00 pct) 14.50 (-11.53 pct)
>> 2: 31.50 (0.00 pct) 35.00 (-11.11 pct)
> It seems that in the old result:
> NPS Mode - NPS1
> #workers: sched-tip util-avg
> 1: 13.00 (0.00 pct) 14.50 (-11.53 pct)
> 2: 31.50 (0.00 pct) 34.00 (-7.93 pct)
> we still saw some downgradings. Although in the v1 patch,
> there is no logic change when the utilization is below 85%.
> I'm thinking of this might be deviation when the load is low.
> For example in v2 test of schbench, 3 cycles of testings were
> launched:
> case load baseline(std%) compare%( std%)
> normal 1 mthread group 1.00 ( 17.92) +19.23 ( 23.67)
> The standard deviation ratio is 23%, which seams to be relatively
> large. But consider that v2 patch has changed the logic of how aggressive
> we searching for a idle CPU, even in low utilization, this result
> needs to be evaluated.
We too see a lot of variation for schbench. For two worker case,
following is the data from 10 runs in NPS1 mode:
- sched-tip data
Min : 23.00
Max : 40.00
Median : 31.50
AMean : 30.50
GMean : 29.87
HMean : 29.25
AMean Stddev : 6.55
GMean Stddev : 6.59
HMean Stddev : 6.68
AMean CoefVar : 21.49 pct
GMean CoefVar : 22.05 pct
HMean CoefVar : 22.85 pct
- v2_sis_prop data
Min : 22.00
Max : 41.00
Median : 35.00
AMean : 33.50
GMean : 32.84
HMean : 32.13
AMean Stddev : 6.64
GMean Stddev : 6.67
HMean Stddev : 6.79
AMean CoefVar : 19.81 pct
GMean CoefVar : 20.32 pct
HMean CoefVar : 21.14 pct
The median of the data was reported previously.
> [..snip..]
>> ~~~~~~
>> tbench
>> ~~~~~~
>>
>> NPS 1
>>
>> Clients: sched-tip v2_sis_prop
>> 1 477.85 (0.00 pct) 470.68 (-1.50 pct)
>> 2 924.07 (0.00 pct) 910.82 (-1.43 pct)
>> 4 1778.95 (0.00 pct) 1743.64 (-1.98 pct)
>> 8 3244.81 (0.00 pct) 3200.35 (-1.37 pct)
>> 16 5837.06 (0.00 pct) 5808.36 (-0.49 pct)
>> 32 9339.33 (0.00 pct) 8648.03 (-7.40 pct)
>> 64 14761.19 (0.00 pct) 15803.13 (7.05 pct)
>> 128 27806.11 (0.00 pct) 27510.69 (-1.06 pct)
>> 256 35262.03 (0.00 pct) 34135.78 (-3.19 pct)
> The result from v1 patch:
> NPS Mode - NPS1
> Clients: sched-tip util-avg
> 256 26726.29 (0.00 pct) 52502.83 (96.44 pct)
>> 512 52459.78 (0.00 pct) 51630.53 (-1.58 pct)
>> 1024 52480.67 (0.00 pct) 52439.37 (-0.07 pct)
>>
>> NPS 2
>>
>> Clients: sched-tip v2_sis_prop
>> 1 478.98 (0.00 pct) 472.98 (-1.25 pct)
>> 2 930.52 (0.00 pct) 914.48 (-1.72 pct)
>> 4 1743.26 (0.00 pct) 1711.16 (-1.84 pct)
>> 8 3297.07 (0.00 pct) 3161.12 (-4.12 pct)
>> 16 5779.10 (0.00 pct) 5738.38 (-0.70 pct)
>> 32 10708.42 (0.00 pct) 10748.26 (0.37 pct)
>> 64 16965.21 (0.00 pct) 16894.42 (-0.41 pct)
>> 128 29152.49 (0.00 pct) 28287.31 (-2.96 pct)
>> 256 27408.75 (0.00 pct) 33680.59 (22.88 pct)
> The result from v1 patch:
> 256 27654.49 (0.00 pct) 47126.18 (70.41 pct)
>> 512 51453.64 (0.00 pct) 47546.87 (-7.59 pct)
>> 1024 52156.85 (0.00 pct) 51233.28 (-1.77 pct)
>>
>> NPS 4
>>
>> Clients: sched-tip v2_sis_prop
>> 1 480.29 (0.00 pct) 473.75 (-1.36 pct)
>> 2 940.23 (0.00 pct) 915.60 (-2.61 pct)
>> 4 1760.21 (0.00 pct) 1687.99 (-4.10 pct)
>> 8 3269.75 (0.00 pct) 3154.02 (-3.53 pct)
>> 16 5503.71 (0.00 pct) 5485.01 (-0.33 pct)
>> 32 10633.93 (0.00 pct) 10276.21 (-3.36 pct)
>> 64 16304.44 (0.00 pct) 15351.17 (-5.84 pct)
>> 128 26893.95 (0.00 pct) 25337.08 (-5.78 pct)
>> 256 24469.94 (0.00 pct) 32178.33 (31.50 pct)
> The result from v1 patch:
> 256 25997.38 (0.00 pct) 47735.83 (83.61 pct)
>
> In above three cases, v2 has smaller improvement compared to
> v1. In both v1 and v2, the improvement mainly comes from choosing
> previous running CPU as the target, when the system is busy. But
> v2 is more likely to choose a previous CPU than v1, because its
> threshold 50% is lower than 85% from v2. Then why v2 has less improvement
> than v1? It seems that v2 patch only changes the logic of SIS_PRO for
> single idle CPU searching, but not touches the idle Core searching.
> Meanwhile v1 limits both idle CPU and idle Core searching, and this
> might explain the extra benefit from v1 patch IMO.
Yes, this might be the case.
>> [..snip..]
>> ~~~~~~~~~~~~
>> ycsb-mongodb
>> ~~~~~~~~~~~~
>>
>> NPS1
>>
>> sched-tip: 304934.67 (var: 0.88)
>> v2_sis_prop: 301560.0 (var: 2.0) (-1.1%)
>>
>> NPS2
>>
>> sched-tip: 303757.0 (var: 1.0)
>> v2_sis_prop: 302283.0 (var: 0.58) (-0.48%)
>>
>> NPS4
>>
>> sched-tip: 308106.67 (var: 2.88)
>> v2_sis_prop: 302302.67 (var: 1.12) (-1.88%)
>>
> May I know the average CPU utilization of this benchmark?
I don't have this data at hand. I'll get back to you soon with the data.
> [..snip..]
> I see. But we might have to make this per-LLC search generic, both for smaller
> size and bigger size. Current using exponential descent function could increase the
> number of CPUs to be searched when the system is not busy. I'll think about it
> and do some investigation.
It would indeed be great to have this work well for all LLC sizes.
Thank you for looking into it :)
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists