[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <adfb65ab-8621-b6c0-bb77-39c9be3486b7@amd.com>
Date: Mon, 16 May 2022 16:22:34 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Mel Gorman <mgorman@...e.de>,
Yicong Yang <yangyicong@...ilicon.com>,
Tim Chen <tim.c.chen@...el.com>,
Chen Yu <yu.chen.surf@...il.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Barry Song <21cnbao@...il.com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Len Brown <len.brown@...el.com>,
Ben Segall <bsegall@...gle.com>,
Aubrey Li <aubrey.li@...el.com>,
Abel Wu <wuyun.abel@...edance.com>,
Zhang Rui <rui.zhang@...el.com>, linux-kernel@...r.kernel.org,
Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v3] sched/fair: Introduce SIS_UTIL to search idle CPU
based on sum of util_avg
Hello Chenyu,
Thank you taking a look at the results.
On 5/14/2022 4:25 PM, Chen Yu wrote:
> [..snip..]
> May I know if in all NPS mode, all LLC domains have 16 CPUs?
Yes. The number of CPUs in LLC domain is always 16 irrespective of the NPS mode.
>> Following is the NUMA configuration for each NPS mode on the system:
>>
>> NPS1: Each socket is a NUMA node.
>> Total 2 NUMA nodes in the dual socket machine.
>>
>> Node 0: 0-63, 128-191
>> Node 1: 64-127, 192-255
>>
>> NPS2: Each socket is further logically divided into 2 NUMA regions.
>> Total 4 NUMA nodes exist over 2 socket.
>>
>> Node 0: 0-31, 128-159
>> Node 1: 32-63, 160-191
>> Node 2: 64-95, 192-223
>> Node 3: 96-127, 223-255
>>
>> NPS4: Each socket is logically divided into 4 NUMA regions.
>> Total 8 NUMA nodes exist over 2 socket.
>>
>> Node 0: 0-15, 128-143
>> Node 1: 16-31, 144-159
>> Node 2: 32-47, 160-175
>> Node 3: 48-63, 176-191
>> Node 4: 64-79, 192-207
>> Node 5: 80-95, 208-223
>> Node 6: 96-111, 223-231
>> Node 7: 112-127, 232-255
>>
>> Kernel versions:
>> - tip: 5.18-rc1 tip sched/core
>> - SIS_UTIL: 5.18-rc1 tip sched/core + this patch
>>
>> When we began testing, tip was at:
>>
>> commit: a658353167bf "sched/fair: Revise comment about lb decision matrix"
>>
>> Following are the results from the benchmark:
>>
>> * - Data points of concern
>>
>> ~~~~~~~~~
>> hackbench
>> ~~~~~~~~~
>>
>> NPS1
>>
>> Test: tip SIS_UTIL
>> 1-groups: 4.64 (0.00 pct) 4.70 (-1.29 pct)
>> 2-groups: 5.38 (0.00 pct) 5.45 (-1.30 pct)
>> 4-groups: 6.15 (0.00 pct) 6.10 (0.81 pct)
>> 8-groups: 7.42 (0.00 pct) 7.42 (0.00 pct)
>> 16-groups: 10.70 (0.00 pct) 11.69 (-9.25 pct) *
>>
>> NPS2
>>
>> Test: tip SIS_UTIL
>> 1-groups: 4.70 (0.00 pct) 4.70 (0.00 pct)
>> 2-groups: 5.45 (0.00 pct) 5.46 (-0.18 pct)
>> 4-groups: 6.13 (0.00 pct) 6.05 (1.30 pct)
>> 8-groups: 7.30 (0.00 pct) 7.05 (3.42 pct)
>> 16-groups: 10.30 (0.00 pct) 10.12 (1.74 pct)
>>
>> NPS4
>>
>> Test: tip SIS_UTIL
>> 1-groups: 4.60 (0.00 pct) 4.75 (-3.26 pct) *
>> 2-groups: 5.41 (0.00 pct) 5.42 (-0.18 pct)
>> 4-groups: 6.12 (0.00 pct) 6.00 (1.96 pct)
>> 8-groups: 7.22 (0.00 pct) 7.10 (1.66 pct)
>> 16-groups: 10.24 (0.00 pct) 10.11 (1.26 pct)
>>
>> ~~~~~~~~
>> schbench
>> ~~~~~~~~
>>
>> NPS 1
>>
>> #workers: tip SIS_UTIL
>> 1: 29.00 (0.00 pct) 21.00 (27.58 pct)
>> 2: 28.00 (0.00 pct) 28.00 (0.00 pct)
>> 4: 31.50 (0.00 pct) 31.00 (1.58 pct)
>> 8: 42.00 (0.00 pct) 39.00 (7.14 pct)
>> 16: 56.50 (0.00 pct) 54.50 (3.53 pct)
>> 32: 94.50 (0.00 pct) 94.00 (0.52 pct)
>> 64: 176.00 (0.00 pct) 175.00 (0.56 pct)
>> 128: 404.00 (0.00 pct) 394.00 (2.47 pct)
>> 256: 869.00 (0.00 pct) 863.00 (0.69 pct)
>> 512: 58432.00 (0.00 pct) 55424.00 (5.14 pct)
>>
>> NPS2
>>
>> #workers: tip SIS_UTIL
>> 1: 26.50 (0.00 pct) 25.00 (5.66 pct)
>> 2: 26.50 (0.00 pct) 25.50 (3.77 pct)
>> 4: 34.50 (0.00 pct) 34.00 (1.44 pct)
>> 8: 45.00 (0.00 pct) 46.00 (-2.22 pct)
>> 16: 56.50 (0.00 pct) 60.50 (-7.07 pct) *
>> 32: 95.50 (0.00 pct) 93.00 (2.61 pct)
>> 64: 179.00 (0.00 pct) 179.00 (0.00 pct)
>> 128: 369.00 (0.00 pct) 376.00 (-1.89 pct)
>> 256: 898.00 (0.00 pct) 903.00 (-0.55 pct)
>> 512: 56256.00 (0.00 pct) 57088.00 (-1.47 pct)
>>
>> NPS4
>>
>> #workers: tip SIS_UTIL
>> 1: 25.00 (0.00 pct) 21.00 (16.00 pct)
>> 2: 28.00 (0.00 pct) 24.00 (14.28 pct)
>> 4: 29.50 (0.00 pct) 29.50 (0.00 pct)
>> 8: 41.00 (0.00 pct) 37.50 (8.53 pct)
>> 16: 65.50 (0.00 pct) 64.00 (2.29 pct)
>> 32: 93.00 (0.00 pct) 94.50 (-1.61 pct)
>> 64: 170.50 (0.00 pct) 175.50 (-2.93 pct)
>> 128: 377.00 (0.00 pct) 368.50 (2.25 pct)
>> 256: 867.00 (0.00 pct) 902.00 (-4.03 pct)
>> 512: 58048.00 (0.00 pct) 55488.00 (4.41 pct)
>>
>> ~~~~~~
>> tbench
>> ~~~~~~
>>
>> NPS 1
>>
>> Clients: tip SIS_UTIL
>> 1 443.31 (0.00 pct) 456.19 (2.90 pct)
>> 2 877.32 (0.00 pct) 875.24 (-0.23 pct)
>> 4 1665.11 (0.00 pct) 1647.31 (-1.06 pct)
>> 8 3016.68 (0.00 pct) 2993.23 (-0.77 pct)
>> 16 5374.30 (0.00 pct) 5246.93 (-2.36 pct)
>> 32 8763.86 (0.00 pct) 7878.18 (-10.10 pct) *
>> 64 15786.93 (0.00 pct) 12958.47 (-17.91 pct) *
>> 128 26826.08 (0.00 pct) 26741.14 (-0.31 pct)
>> 256 24207.35 (0.00 pct) 52041.89 (114.98 pct)
>> 512 51740.58 (0.00 pct) 52084.44 (0.66 pct)
>> 1024 51177.82 (0.00 pct) 53126.29 (3.80 pct)
>>
>> NPS 2
>>
>> Clients: tip SIS_UTIL
>> 1 449.49 (0.00 pct) 447.96 (-0.34 pct)
>> 2 867.28 (0.00 pct) 869.52 (0.25 pct)
>> 4 1643.60 (0.00 pct) 1625.91 (-1.07 pct)
>> 8 3047.35 (0.00 pct) 2952.82 (-3.10 pct)
>> 16 5340.77 (0.00 pct) 5251.41 (-1.67 pct)
>> 32 10536.85 (0.00 pct) 8843.49 (-16.07 pct) *
>> 64 16543.23 (0.00 pct) 14265.35 (-13.76 pct) *
>> 128 26400.40 (0.00 pct) 25595.42 (-3.04 pct)
>> 256 23436.75 (0.00 pct) 47090.03 (100.92 pct)
>> 512 50902.60 (0.00 pct) 50036.58 (-1.70 pct)
>> 1024 50216.10 (0.00 pct) 50639.74 (0.84 pct)
>>
>> NPS 4
>>
>> Clients: tip SIS_UTIL
>> 1 443.82 (0.00 pct) 459.93 (3.62 pct)
>> 2 849.14 (0.00 pct) 882.17 (3.88 pct)
>> 4 1603.26 (0.00 pct) 1629.64 (1.64 pct)
>> 8 2972.37 (0.00 pct) 3003.09 (1.03 pct)
>> 16 5277.13 (0.00 pct) 5234.07 (-0.81 pct)
>> 32 9744.73 (0.00 pct) 9347.90 (-4.07 pct) *
>> 64 15854.80 (0.00 pct) 14180.27 (-10.56 pct) *
>> 128 26116.97 (0.00 pct) 24597.45 (-5.81 pct) *
>> 256 22403.25 (0.00 pct) 47385.09 (111.50 pct)
>> 512 48317.20 (0.00 pct) 49781.02 (3.02 pct)
>> 1024 50445.41 (0.00 pct) 51607.53 (2.30 pct)
>>
>> ~~~~~~
>> Stream
>> ~~~~~~
>>
>> - 10 runs
>>
>> NPS1
>>
>> tip SIS_UTIL
>> Copy: 189113.11 (0.00 pct) 188490.27 (-0.32 pct)
>> Scale: 201190.61 (0.00 pct) 204526.15 (1.65 pct)
>> Add: 232654.21 (0.00 pct) 234948.01 (0.98 pct)
>> Triad: 226583.57 (0.00 pct) 228844.43 (0.99 pct)
>>
>> NPS2
>>
>> Test: tip SIS_UTIL
>> Copy: 155347.14 (0.00 pct) 169386.29 (9.03 pct)
>> Scale: 191701.53 (0.00 pct) 196110.51 (2.29 pct)
>> Add: 210013.97 (0.00 pct) 221088.45 (5.27 pct)
>> Triad: 207602.00 (0.00 pct) 218072.52 (5.04 pct)
>>
>> NPS4
>>
>> Test: tip SIS_UTIL
>> Copy: 136421.15 (0.00 pct) 140894.11 (3.27 pct)
>> Scale: 191217.59 (0.00 pct) 190554.17 (-0.34 pct)
>> Add: 189229.52 (0.00 pct) 190871.88 (0.86 pct)
>> Triad: 188052.99 (0.00 pct) 188417.63 (0.19 pct)
>>
>> - 100 runs
>>
>> NPS1
>>
>> Test: tip SIS_UTIL
>> Copy: 244693.32 (0.00 pct) 232328.05 (-5.05 pct)
>> Scale: 221874.99 (0.00 pct) 216858.39 (-2.26 pct)
>> Add: 268363.89 (0.00 pct) 265449.16 (-1.08 pct)
>> Triad: 260945.24 (0.00 pct) 252240.56 (-3.33 pct)
>>
>> NPS2
>>
>> Test: tip SIS_UTIL
>> Copy: 211262.00 (0.00 pct) 225240.03 (6.61 pct)
>> Scale: 222493.34 (0.00 pct) 219094.65 (-1.52 pct)
>> Add: 280277.17 (0.00 pct) 275677.73 (-1.64 pct)
>> Triad: 265860.49 (0.00 pct) 262584.22 (-1.23 pct)
>>
>> NPS4
>>
>> Test: tip SIS_UTIL
>> Copy: 250171.40 (0.00 pct) 230983.60 (-7.66 pct)
>> Scale: 222293.56 (0.00 pct) 215984.34 (-2.83 pct)
>> Add: 279222.16 (0.00 pct) 270402.64 (-3.15 pct)
>> Triad: 262013.92 (0.00 pct) 254820.60 (-2.74 pct)
>>
>> ~~~~~~~~~~~~
>> ycsb-mongodb
>> ~~~~~~~~~~~~
>>
>> NPS1
>>
>> sched-tip: 303718.33 (var: 1.31)
>> SIS_UTIL: 303529.33 (var: 0.67) (-0.06%)
>>
>> NPS2
>>
>> sched-tip: 304536.33 (var: 2.46)
>> SIS_UTIL: 303730.33 (var: 1.57) (-0.26%)
>>
>> NPS4
>>
>> sched-tip: 301192.33 (var: 1.81)
>> SIS_UTIL: 300101.33 (var: 0.35) (-0.36%)
>>
>> ~~~~~~~~~~~~~~~~~~
>>
>> Notes:
>>
>> - There seems to be some noticeable regression for hackbench
>> with 16 groups in NPS1 mode.
> Did the hackbench use the default fd number(20) in every group? If
> this is the case, then there are 16 * 20 * 2 = 640 threads in the
> system. I thought this should be overloaded, either in SIS_PROP or
> SIS_UTIL, the search depth might be 4 and 0 respectively. And it
> is also very likely the SIS_PROP will not find an idle CPU after
> searching for 4 CPUs. So in theory there should be not much performance
> difference with vs without the patch applied. But if the fd number is set
> to a smaller one, the regression could be explained as you mentioned,
> SIS_PROP search more aggressively.
Yes, I'm using fd number(20). The logs from hackbench run show that it is
indeed running 640 threads with 16 groups:
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 16 groups == 640 threads run
This is indeed counterintuitive and I don't have
a good explanation for this other than that SIS_PROP
probably finding slightly greater success at finding
an idle CPU even in this overloaded environment.
I've ran the benchmark in two sets of 3 runs rebooting
in between on each kernel version:
- tip
Test: tip-r0 tip-r1 tip-r2
1-groups: 4.64 (0.00 pct) 4.90 (-5.60 pct) 4.99 (-7.54 pct)
2-groups: 5.54 (0.00 pct) 5.56 (-0.36 pct) 5.58 (-0.72 pct)
4-groups: 6.24 (0.00 pct) 6.18 (0.96 pct) 6.20 (0.64 pct)
8-groups: 7.54 (0.00 pct) 7.50 (0.53 pct) 7.54 (0.00 pct)
16-groups: 10.85 (0.00 pct) 11.17 (-2.94 pct) 10.91 (-0.55 pct)
Test: tip-r3 tip-r4 tip-r5
1-groups: 4.68 (0.00 pct) 4.97 (-6.19 pct) 4.98 (-6.41 pct)
2-groups: 5.60 (0.00 pct) 5.62 (-0.35 pct) 5.66 (-1.07 pct)
4-groups: 6.24 (0.00 pct) 6.23 (0.16 pct) 6.24 (0.00 pct)
8-groups: 7.54 (0.00 pct) 7.50 (0.53 pct) 7.46 (1.06 pct)
16-groups: 10.81 (0.00 pct) 10.84 (-0.27 pct) 10.81 (0.00 pct)
- SIS_UTIL
Test: SIS_UTIL-r0 SIS_UTIL-r1 SIS_UTIL-r2
1-groups: 4.68 (0.00 pct) 5.03 (-7.47 pct) 4.96 (-5.98 pct)
2-groups: 5.45 (0.00 pct) 5.48 (-0.55 pct) 5.50 (-0.91 pct)
4-groups: 6.10 (0.00 pct) 6.07 (0.49 pct) 6.14 (-0.65 pct)
8-groups: 7.52 (0.00 pct) 7.51 (0.13 pct) 7.52 (0.00 pct)
16-groups: 11.63 (0.00 pct) 11.48 (1.28 pct) 11.51 (1.03 pct)
Test: SIS_UTIL-r3 SIS_UTIL-r4 SIS_UTIL-r5
1-groups: 4.80 (0.00 pct) 5.00 (-4.16 pct) 5.06 (-5.41 pct)
2-groups: 5.51 (0.00 pct) 5.58 (-1.27 pct) 5.58 (-1.27 pct)
4-groups: 6.14 (0.00 pct) 6.11 (0.48 pct) 6.06 (1.30 pct)
8-groups: 7.35 (0.00 pct) 7.38 (-0.40 pct) 7.40 (-0.68 pct)
16-groups: 11.03 (0.00 pct) 11.29 (-2.35 pct) 11.14 (-0.99 pct)
- Comparing the best and bad data points for 16-groups with each
kernel version:
Test: tip-good SIS_UTIL-good
1-groups: 4.68 (0.00 pct) 4.80 (-2.56 pct)
2-groups: 5.60 (0.00 pct) 5.51 (1.60 pct)
4-groups: 6.24 (0.00 pct) 6.14 (1.60 pct)
8-groups: 7.54 (0.00 pct) 7.35 (2.51 pct)
16-groups: 10.81 (0.00 pct) 11.03 (-2.03 pct)
Test: tip-good SIS_UTIL-bad
1-groups: 4.68 (0.00 pct) 4.68 (0.00 pct)
2-groups: 5.60 (0.00 pct) 5.45 (2.67 pct)
4-groups: 6.24 (0.00 pct) 6.10 (2.24 pct)
8-groups: 7.54 (0.00 pct) 7.52 (0.26 pct)
16-groups: 10.81 (0.00 pct) 11.63 (-7.58 pct)
Test: tip-bad SIS_UTIL-good
1-groups: 4.90 (0.00 pct) 4.80 (2.04 pct)
2-groups: 5.56 (0.00 pct) 5.51 (0.89 pct)
4-groups: 6.18 (0.00 pct) 6.14 (0.64 pct)
8-groups: 7.50 (0.00 pct) 7.35 (2.00 pct)
16-groups: 11.17 (0.00 pct) 11.03 (1.25 pct)
Test: tip-bad SIS_UTIL-bad
1-groups: 4.90 (0.00 pct) 4.68 (4.48 pct)
2-groups: 5.56 (0.00 pct) 5.45 (1.97 pct)
4-groups: 6.18 (0.00 pct) 6.10 (1.29 pct)
8-groups: 7.50 (0.00 pct) 7.52 (-0.26 pct)
16-groups: 11.17 (0.00 pct) 11.63 (-4.11 pct)
Hackbench consistently reports > 11 for 16-group
case with SIS_UTIL however only once with SIS_PROP
>> - There seems to be regression in tbench for case with number
>> of workers in range 32-128 (12.5% loaded to 50% loaded)
>> - tbench reaches saturation early when system is fully loaded
>>
>> This probably show that the strategy in the initial v1 RFC
>> seems to work better with our LLC where number of CPUs per LLC
>> is low compared to systems with unified LLC. Given this is
>> showing great results for unified LLC, maybe SIS_PROP and SIS_UTIL
>> can be enabled based on the the size of LLC.
>>
> Yes, SIS_PROP searches more aggressively, but we attempts to replace
> SIS_PROP with a more accurate policy.
>>> [..snip..]
>>>
>>> [3]
>>> Prateek mentioned that we should scan aggressively in an LLC domain
>>> with 16 CPUs. Because the cost to search for an idle one among 16 CPUs is
>>> negligible. The current patch aims to propose a generic solution and only
>>> considers the util_avg. A follow-up change could enhance the scan policy
>>> to adjust the scan_percent according to the CPU number in LLC.
>> Following are some additional numbers I would like to share comparing SIS_PROP and
>> SIS_UTIL:
>>
> Nice analysis.
>> o Hackbench with 1 group
>>
>> With 1 group, following are the chances of SIS_PROP
>> and SIS_UTIL finding an idle CPU when an idle CPU
>> exists in LLC:
>>
>> +-----------------+---------------------------+---------------------------+--------+
>> | Idle CPU in LLC | SIS_PROP able to find CPU | SIS_UTIL able to find CPU | Count |
>> +-----------------+---------------------------+---------------------------+--------+
>> | 1 | 0 | 0 | 66444 |
>> | 1 | 0 | 1 | 34153 |
>> | 1 | 1 | 0 | 57204 |
>> | 1 | 1 | 1 | 119263 |
>> +-----------------+---------------------------+---------------------------+--------+
>>
> So SIS_PROP searches more, and get higher chance to find an idle CPU in a LLC with
> 16 CPUs.
Yes!
>> SIS_PROP vs no SIS_PROP CPU search stats:
>>
>> Total time without SIS_PROP: 90653653
>> Total time with SIS_PROP: 53558942 (-40.92 pct)
>> Total time saved: 37094711
>>
> What does no SIS_PROP mean? Is it with SIS_PROP disabled and
> SIS_UTIL enabled? Or with both SIS_PROP and SIS_UTIL disabled?
> If it is the latter, is there any performance difference between
> the two?
Sorry for not being clear here. No SIS_PROP mean we are searching the
entire LLC all the time for an idle CPU.This data aims to find how much time SIS_PROP is saving compared tocase where it is disabled.
>> Following are number of CPUs SIS_UTIL will search when SIS_PROP limit >= 16 (LLC size):
>>
>> +--------------+-------+
>> | CPU Searched | Count |
>> +--------------+-------+
>> | 0 | 10520 |
>> | 1 | 7770 |
>> | 2 | 11976 |
>> | 3 | 17554 |
>> | 4 | 13932 |
>> | 5 | 15051 |
>> | 6 | 8398 |
>> | 7 | 4544 |
>> | 8 | 3712 |
>> | 9 | 2337 |
>> | 10 | 4541 |
>> | 11 | 1947 |
>> | 12 | 3846 |
>> | 13 | 3645 |
>> | 14 | 2686 |
>> | 15 | 8390 |
>> | 16 | 26157 |
>> +--------------+-------+
>>
>> - SIS_UTIL might be bailing out too early in some of these cases.
>>
> Right.
>> o Hackbench with 16 group
>>
>> the success rate looks as follows:
>>
>> +-----------------+---------------------------+---------------------------+---------+
>> | Idle CPU in LLC | SIS_PROP able to find CPU | SIS_UTIL able to find CPU | Count |
>> +-----------------+---------------------------+---------------------------+---------+
>> | 1 | 0 | 0 | 1313745 |
>> | 1 | 0 | 1 | 694132 |
>> | 1 | 1 | 0 | 2888450 |
>> | 1 | 1 | 1 | 5343065 |
>> +-----------------+---------------------------+---------------------------+---------+
>>
>> SIS_PROP vs no SIS_PROP CPU search stats:
>>
>> Total time without SIS_PROP: 5227299388
>> Total time with SIS_PROP: 3866575188 (-26.03 pct)
>> Total time saved: 1360724200
>>
>> Following are number of CPUs SIS_UTIL will search when SIS_PROP limit >= 16 (LLC size):
>>
>> +--------------+---------+
>> | CPU Searched | Count |
>> +--------------+---------+
>> | 0 | 150351 |
>> | 1 | 105116 |
>> | 2 | 214291 |
>> | 3 | 440053 |
>> | 4 | 914116 |
>> | 5 | 1757984 |
>> | 6 | 2410484 |
>> | 7 | 1867668 |
>> | 8 | 379888 |
>> | 9 | 84055 |
>> | 10 | 55389 |
>> | 11 | 26795 |
>> | 12 | 43113 |
>> | 13 | 24579 |
>> | 14 | 32896 |
>> | 15 | 70059 |
>> | 16 | 150858 |
>> +--------------+---------+
>>
>> - SIS_UTIL might be bailing out too early in most of these cases
>>
> It might be interesting to see what the current sum of util_avg is, and this suggested that,
> even if util_avg is a little high, it might be still be worthwhile to search more CPUs.
I agree. Let me know if there is any data you would like me to collect wrt this.
>> o tbench with 256 workers
>>
>> For tbench with 256 threads, SIS_UTIL works great as we have drastically cut down the number
>> of CPUs to search.
>>
>> SIS_PROP vs no SIS_PROP CPU search stats:
>>
>> Total time without SIS_PROP: 64004752959
>> Total time with SIS_PROP: 34695004390 (-45.79 pct)
>> Total time saved: 29309748569
>>
>> Following are number of CPUs SIS_UTIL will search when SIS_PROP limit >= 16 (LLC size):
>>
>> +--------------+----------+
>> | CPU Searched | Count |
>> +--------------+----------+
>> | 0 | 500077 |
>> | 1 | 543865 |
>> | 2 | 4257684 |
>> | 3 | 27457498 |
>> | 4 | 40208673 |
>> | 5 | 3264358 |
>> | 6 | 191631 |
>> | 7 | 24658 |
>> | 8 | 2469 |
>> | 9 | 1374 |
>> | 10 | 2008 |
>> | 11 | 1300 |
>> | 12 | 1226 |
>> | 13 | 1179 |
>> | 14 | 1631 |
>> | 15 | 11678 |
>> | 16 | 7793 |
>> +--------------+----------+
>>
>> - This is where SIS_UTIL shines for tbench case with 256 workers as it is effective
>> at restricting search space well.
>>
>> o Observations
>>
>> SIS_PROP seems to have a higher chance of finding an idle CPU compared to SIS_UTIL
>> in case of hackbench with 16-group. The gap between SIS_PROP and SIS_UTIL is wider
>> with 16 groups compared to than with 1 group.
>> Also SIS_PROP is more aggressive at saving time for 1-group compared to the
>> case with 16-groups.
>>
>> The bailout from SIS_UTIL is fruitful for tbench with 256 workers leading to massive
>> performance gain in a fully loaded system.
>>
>> Note: There might be some inaccuracies for the numbers presented for metrics that
>> directly compare SIS_PROP and SIS_UTIL as both SIS_PROP and SIS_UTIL were enabled
>> when gathering these data points and the results from SIS_PROP were returned from
>> search_idle_cpu().
> Do you mean the 'CPU Searched' calculated by SIS_PROP was collected with both SIS_UTIL
> and SIS_PROP enabled?
Yes, the table
"Number of CPUs SIS_UTIL will search when SIS_PROP limit >= 16 (LLC size)"
was obtained by enabling both the features - SIS_PROP and SIS_UTIL, and
comparing the nr values suggested by SIS_UTIL when SIS_PROP allowed
searching for the entire LLC.
>> All the numbers for the above analysis were gathered in NPS1 mode.
>>
> I'm thinking of taking nr_llc number into consideration to adjust the search depth,
> something like:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index dd52fc5a034b..39b914599dce 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9302,6 +9302,9 @@ static inline void update_idle_cpu_scan(struct lb_env *env,
> llc_util_pct = (sum_util * 100) / (nr_llc * SCHED_CAPACITY_SCALE);
> nr_scan = (100 - (llc_util_pct * llc_util_pct / 72)) * nr_llc / 100;
> nr_scan = max(nr_scan, 0);
> + if (nr_llc <= 16 && nr_scan)
> + nr_scan = nr_llc;
> +
This will behave closer to the initial RFC on systems with smaller LLC.
I can do some preliminary testing with this and get back to you.
> WRITE_ONCE(sd_share->nr_idle_scan, nr_scan);
> }
>
> I'll offline the CPUs to make it 16 CPUs per LLC, and check what hackbench behaves.
Thank you for looking into this.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists