linux-kernel - Re: [PATCH v3] sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ed985eb5-abc7-34f4-7a10-e3a08800b324@amd.com>
Date:   Fri, 13 May 2022 12:07:00 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Chen Yu <yu.c.chen@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Mel Gorman <mgorman@...e.de>,
        Yicong Yang <yangyicong@...ilicon.com>,
        Tim Chen <tim.c.chen@...el.com>
Cc:     Chen Yu <yu.chen.surf@...il.com>, Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Barry Song <21cnbao@...il.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Len Brown <len.brown@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Aubrey Li <aubrey.li@...el.com>,
        Abel Wu <wuyun.abel@...edance.com>,
        Zhang Rui <rui.zhang@...el.com>, linux-kernel@...r.kernel.org,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v3] sched/fair: Introduce SIS_UTIL to search idle CPU
 based on sum of util_avg

Hello Chenyu,

Sorry for the delay with analysis.

On 4/28/2022 11:54 PM, Chen Yu wrote:
> [Problem Statement]
> select_idle_cpu() might spend too much time searching for an idle CPU,
> when the system is overloaded.
>
> [..snip..]
>
> [Test result]
>
> The following is the benchmark result comparison between
> baseline:vanilla and compare:patched kernel. Positive compare%
> indicates better performance.
>
> netperf.throughput
> each thread: netperf -4 -H 127.0.0.1 -t TCP/UDP_RR -c -C -l 100
> =======
> case            	load    	baseline(std%)	compare%( std%)
> TCP_RR          	28 threads	 1.00 (  0.40)	 +1.14 (  0.37)
> TCP_RR          	56 threads	 1.00 (  0.49)	 +0.62 (  0.31)
> TCP_RR          	84 threads	 1.00 (  0.50)	 +0.26 (  0.55)
> TCP_RR          	112 threads	 1.00 (  0.27)	 +0.29 (  0.28)
> TCP_RR          	140 threads	 1.00 (  0.22)	 +0.14 (  0.23)
> TCP_RR          	168 threads	 1.00 (  0.21)	 +0.40 (  0.19)
> TCP_RR          	196 threads	 1.00 (  0.18)	+183.40 ( 16.43)
> TCP_RR          	224 threads	 1.00 (  0.16)	+188.44 (  9.29)
> UDP_RR          	28 threads	 1.00 (  0.47)	 +1.45 (  0.47)
> UDP_RR          	56 threads	 1.00 (  0.28)	 -0.22 (  0.30)
> UDP_RR          	84 threads	 1.00 (  0.38)	 +1.72 ( 27.10)
> UDP_RR          	112 threads	 1.00 (  0.16)	 +0.01 (  0.18)
> UDP_RR          	140 threads	 1.00 ( 14.10)	 +0.32 ( 11.15)
> UDP_RR          	168 threads	 1.00 ( 12.75)	 +0.91 ( 11.62)
> UDP_RR          	196 threads	 1.00 ( 14.41)	+191.97 ( 19.34)
> UDP_RR          	224 threads	 1.00 ( 15.34)	+194.88 ( 17.06)
>
> Take the 224 threads as an example, the SIS search metrics changes are
> illustrated below:
>
>     vanilla                    patched
>    4544492          +237.5%   15338634        sched_debug.cpu.sis_domain_search.avg
>      38539        +39686.8%   15333634        sched_debug.cpu.sis_failed.avg
>   128300000          -87.9%   15551326        sched_debug.cpu.sis_scanned.avg
>    5842896          +162.7%   15347978        sched_debug.cpu.sis_search.avg
>
> There is -87.9% less CPU scans after patched, which indicates lower overhead.
> Besides, with this patch applied, there is -13% less rq lock contention
> in perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested
> .try_to_wake_up.default_wake_function.woken_wake_function.
> This could help explain the performance improvement - Because this patch allows
> the waking task to remain on the previous CPU, rather than grabbing other CPU's
> lock.
>
> Other benchmarks:
>
> hackbench.throughput
> =========
> case            	load    	baseline(std%)	compare%( std%)
> process-pipe    	1 group 	 1.00 (  0.09)	 -0.54 (  0.82)
> process-pipe    	2 groups 	 1.00 (  0.47)	 +0.89 (  0.61)
> process-pipe    	4 groups 	 1.00 (  0.83)	 +0.90 (  0.15)
> process-pipe    	8 groups 	 1.00 (  0.09)	 +0.31 (  0.07)
> process-sockets 	1 group 	 1.00 (  0.13)	 -0.58 (  0.49)
> process-sockets 	2 groups 	 1.00 (  0.41)	 -0.58 (  0.52)
> process-sockets 	4 groups 	 1.00 (  0.61)	 -0.37 (  0.50)
> process-sockets 	8 groups 	 1.00 (  0.22)	 +1.15 (  0.10)
> threads-pipe    	1 group 	 1.00 (  0.35)	 -0.28 (  0.78)
> threads-pipe    	2 groups 	 1.00 (  0.65)	 +0.03 (  0.96)
> threads-pipe    	4 groups 	 1.00 (  0.43)	 +0.81 (  0.38)
> threads-pipe    	8 groups 	 1.00 (  0.11)	 -1.56 (  0.07)
> threads-sockets 	1 group 	 1.00 (  0.30)	 -0.39 (  0.41)
> threads-sockets 	2 groups 	 1.00 (  0.21)	 -0.23 (  0.27)
> threads-sockets 	4 groups 	 1.00 (  0.23)	 +0.36 (  0.19)
> threads-sockets 	8 groups 	 1.00 (  0.13)	 +1.57 (  0.06)
>
> tbench.throughput
> ======
> case            	load    	baseline(std%)	compare%( std%)
> loopback        	28 threads	 1.00 (  0.15)	 +1.05 (  0.08)
> loopback        	56 threads	 1.00 (  0.09)	 +0.36 (  0.04)
> loopback        	84 threads	 1.00 (  0.12)	 +0.26 (  0.06)
> loopback        	112 threads	 1.00 (  0.12)	 +0.04 (  0.09)
> loopback        	140 threads	 1.00 (  0.04)	 +2.98 (  0.18)
> loopback        	168 threads	 1.00 (  0.10)	 +2.88 (  0.30)
> loopback        	196 threads	 1.00 (  0.06)	 +2.63 (  0.03)
> loopback        	224 threads	 1.00 (  0.08)	 +2.60 (  0.06)
>
> schbench.latency_90%_us
> ========
> case            	load    	baseline	compare%
> normal          	1 mthread	 1.00 	        -1.7%
> normal          	2 mthreads	 1.00           +1.6%
> normal          	4 mthreads	 1.00           +1.4%
> normal          	8 mthreads	 1.00    	+21.0%
>
> Limitations:
> [1]
> This patch is based on the util_avg, which is very sensitive to the CPU
> frequency invariance. The util_avg would decay quite fast when the
> CPU is idle, if the max frequency has been limited by the user.
> Patch [3] should be applied if turbo is disabled manually on Intel
> platforms.
>
> [2]
> There may be unbalanced tasks among CPUs due to CPU affinity. For example,
> suppose the LLC domain is composed of 8 CPUs, and 7 tasks are bound to
> CPU0~CPU6, while CPU7 is idle:
>
>           CPU0    CPU1    CPU2    CPU3    CPU4    CPU5    CPU6    CPU7
> util_avg  1024    1024    1024    1024    1024    1024    1024    0
>
> Since the util_avg ratio is 87.5%( = 7/8 ), which is higher than 85%,
> select_idle_cpu() will not scan, thus CPU7 is undetected.

Following are the results from dual socket Zen3 platform (2 x 64C/128T) running with
various NPS configuration:

Following is the NUMA configuration for each NPS mode on the system:

NPS1: Each socket is a NUMA node.
    Total 2 NUMA nodes in the dual socket machine.

    Node 0: 0-63,   128-191
    Node 1: 64-127, 192-255

NPS2: Each socket is further logically divided into 2 NUMA regions.
    Total 4 NUMA nodes exist over 2 socket.
   
    Node 0: 0-31,   128-159
    Node 1: 32-63,  160-191
    Node 2: 64-95,  192-223
    Node 3: 96-127, 223-255

NPS4: Each socket is logically divided into 4 NUMA regions.
    Total 8 NUMA nodes exist over 2 socket.
   
    Node 0: 0-15,    128-143
    Node 1: 16-31,   144-159
    Node 2: 32-47,   160-175
    Node 3: 48-63,   176-191
    Node 4: 64-79,   192-207
    Node 5: 80-95,   208-223
    Node 6: 96-111,  223-231
    Node 7: 112-127, 232-255

Kernel versions:
- tip:      5.18-rc1 tip sched/core
- SIS_UTIL:    5.18-rc1 tip sched/core + this patch

When we began testing, tip was at:

commit: a658353167bf "sched/fair: Revise comment about lb decision matrix"

Following are the results from the benchmark:

* - Data points of concern

~~~~~~~~~
hackbench
~~~~~~~~~

NPS1

Test:                   tip                     SIS_UTIL
 1-groups:         4.64 (0.00 pct)         4.70 (-1.29 pct)
 2-groups:         5.38 (0.00 pct)         5.45 (-1.30 pct)
 4-groups:         6.15 (0.00 pct)         6.10 (0.81 pct)
 8-groups:         7.42 (0.00 pct)         7.42 (0.00 pct)
16-groups:        10.70 (0.00 pct)        11.69 (-9.25 pct)  *

NPS2

Test:                   tip                     SIS_UTIL
 1-groups:         4.70 (0.00 pct)         4.70 (0.00 pct)
 2-groups:         5.45 (0.00 pct)         5.46 (-0.18 pct)
 4-groups:         6.13 (0.00 pct)         6.05 (1.30 pct)
 8-groups:         7.30 (0.00 pct)         7.05 (3.42 pct)
16-groups:        10.30 (0.00 pct)        10.12 (1.74 pct)

NPS4

Test:                   tip                     SIS_UTIL
 1-groups:         4.60 (0.00 pct)         4.75 (-3.26 pct)  *
 2-groups:         5.41 (0.00 pct)         5.42 (-0.18 pct)
 4-groups:         6.12 (0.00 pct)         6.00 (1.96 pct)
 8-groups:         7.22 (0.00 pct)         7.10 (1.66 pct)
16-groups:        10.24 (0.00 pct)        10.11 (1.26 pct)

~~~~~~~~
schbench
~~~~~~~~

NPS 1

#workers:   tip                     SIS_UTIL
  1:      29.00 (0.00 pct)        21.00 (27.58 pct)
  2:      28.00 (0.00 pct)        28.00 (0.00 pct)
  4:      31.50 (0.00 pct)        31.00 (1.58 pct)
  8:      42.00 (0.00 pct)        39.00 (7.14 pct)
 16:      56.50 (0.00 pct)        54.50 (3.53 pct)
 32:      94.50 (0.00 pct)        94.00 (0.52 pct)
 64:     176.00 (0.00 pct)       175.00 (0.56 pct)
128:     404.00 (0.00 pct)       394.00 (2.47 pct)
256:     869.00 (0.00 pct)       863.00 (0.69 pct)
512:     58432.00 (0.00 pct)     55424.00 (5.14 pct)

NPS2

#workers:      tip                     SIS_UTIL
  1:      26.50 (0.00 pct)        25.00 (5.66 pct)
  2:      26.50 (0.00 pct)        25.50 (3.77 pct)
  4:      34.50 (0.00 pct)        34.00 (1.44 pct)
  8:      45.00 (0.00 pct)        46.00 (-2.22 pct)
 16:      56.50 (0.00 pct)        60.50 (-7.07 pct)        *
 32:      95.50 (0.00 pct)        93.00 (2.61 pct)
 64:     179.00 (0.00 pct)       179.00 (0.00 pct)
128:     369.00 (0.00 pct)       376.00 (-1.89 pct)
256:     898.00 (0.00 pct)       903.00 (-0.55 pct)
512:     56256.00 (0.00 pct)     57088.00 (-1.47 pct)

NPS4

#workers:    tip                     SIS_UTIL
  1:      25.00 (0.00 pct)        21.00 (16.00 pct)
  2:      28.00 (0.00 pct)        24.00 (14.28 pct)
  4:      29.50 (0.00 pct)        29.50 (0.00 pct)
  8:      41.00 (0.00 pct)        37.50 (8.53 pct)
 16:      65.50 (0.00 pct)        64.00 (2.29 pct)
 32:      93.00 (0.00 pct)        94.50 (-1.61 pct)
 64:     170.50 (0.00 pct)       175.50 (-2.93 pct)
128:     377.00 (0.00 pct)       368.50 (2.25 pct)
256:     867.00 (0.00 pct)       902.00 (-4.03 pct)
512:     58048.00 (0.00 pct)     55488.00 (4.41 pct)

~~~~~~
tbench
~~~~~~

NPS 1

Clients:     tip                     SIS_UTIL
    1    443.31 (0.00 pct)       456.19 (2.90 pct)
    2    877.32 (0.00 pct)       875.24 (-0.23 pct)
    4    1665.11 (0.00 pct)      1647.31 (-1.06 pct)
    8    3016.68 (0.00 pct)      2993.23 (-0.77 pct)
   16    5374.30 (0.00 pct)      5246.93 (-2.36 pct)
   32    8763.86 (0.00 pct)      7878.18 (-10.10 pct)     *
   64    15786.93 (0.00 pct)     12958.47 (-17.91 pct)    *
  128    26826.08 (0.00 pct)     26741.14 (-0.31 pct)
  256    24207.35 (0.00 pct)     52041.89 (114.98 pct)
  512    51740.58 (0.00 pct)     52084.44 (0.66 pct)
 1024    51177.82 (0.00 pct)     53126.29 (3.80 pct)

NPS 2

Clients:     tip                     SIS_UTIL
    1    449.49 (0.00 pct)       447.96 (-0.34 pct)
    2    867.28 (0.00 pct)       869.52 (0.25 pct)
    4    1643.60 (0.00 pct)      1625.91 (-1.07 pct)
    8    3047.35 (0.00 pct)      2952.82 (-3.10 pct)
   16    5340.77 (0.00 pct)      5251.41 (-1.67 pct)
   32    10536.85 (0.00 pct)     8843.49 (-16.07 pct)     *
   64    16543.23 (0.00 pct)     14265.35 (-13.76 pct)    *
  128    26400.40 (0.00 pct)     25595.42 (-3.04 pct)
  256    23436.75 (0.00 pct)     47090.03 (100.92 pct)
  512    50902.60 (0.00 pct)     50036.58 (-1.70 pct)
 1024    50216.10 (0.00 pct)     50639.74 (0.84 pct)

NPS 4

Clients:     tip                     SIS_UTIL
    1    443.82 (0.00 pct)       459.93 (3.62 pct)
    2    849.14 (0.00 pct)       882.17 (3.88 pct)
    4    1603.26 (0.00 pct)      1629.64 (1.64 pct)
    8    2972.37 (0.00 pct)      3003.09 (1.03 pct)
   16    5277.13 (0.00 pct)      5234.07 (-0.81 pct)
   32    9744.73 (0.00 pct)      9347.90 (-4.07 pct)      *
   64    15854.80 (0.00 pct)     14180.27 (-10.56 pct)    *
  128    26116.97 (0.00 pct)     24597.45 (-5.81 pct)     *
  256    22403.25 (0.00 pct)     47385.09 (111.50 pct)
  512    48317.20 (0.00 pct)     49781.02 (3.02 pct)
 1024    50445.41 (0.00 pct)     51607.53 (2.30 pct)

~~~~~~
Stream
~~~~~~

- 10 runs

NPS1

              tip                     SIS_UTIL
 Copy:   189113.11 (0.00 pct)    188490.27 (-0.32 pct)
Scale:   201190.61 (0.00 pct)    204526.15 (1.65 pct)
  Add:   232654.21 (0.00 pct)    234948.01 (0.98 pct)
Triad:   226583.57 (0.00 pct)    228844.43 (0.99 pct)

NPS2

Test:         tip                     SIS_UTIL
 Copy:   155347.14 (0.00 pct)    169386.29 (9.03 pct)
Scale:   191701.53 (0.00 pct)    196110.51 (2.29 pct)
  Add:   210013.97 (0.00 pct)    221088.45 (5.27 pct)
Triad:   207602.00 (0.00 pct)    218072.52 (5.04 pct)

NPS4

Test:         tip                     SIS_UTIL
 Copy:   136421.15 (0.00 pct)    140894.11 (3.27 pct)
Scale:   191217.59 (0.00 pct)    190554.17 (-0.34 pct)
  Add:   189229.52 (0.00 pct)    190871.88 (0.86 pct)
Triad:   188052.99 (0.00 pct)    188417.63 (0.19 pct)

- 100 runs

NPS1

Test:       tip                     SIS_UTIL
 Copy:   244693.32 (0.00 pct)    232328.05 (-5.05 pct)
Scale:   221874.99 (0.00 pct)    216858.39 (-2.26 pct)
  Add:   268363.89 (0.00 pct)    265449.16 (-1.08 pct)
Triad:   260945.24 (0.00 pct)    252240.56 (-3.33 pct)

NPS2

Test:       tip                     SIS_UTIL
 Copy:   211262.00 (0.00 pct)    225240.03 (6.61 pct)
Scale:   222493.34 (0.00 pct)    219094.65 (-1.52 pct)
  Add:   280277.17 (0.00 pct)    275677.73 (-1.64 pct)
Triad:   265860.49 (0.00 pct)    262584.22 (-1.23 pct)

NPS4

Test:       tip                     SIS_UTIL
 Copy:   250171.40 (0.00 pct)    230983.60 (-7.66 pct)
Scale:   222293.56 (0.00 pct)    215984.34 (-2.83 pct)
  Add:   279222.16 (0.00 pct)    270402.64 (-3.15 pct)
Triad:   262013.92 (0.00 pct)    254820.60 (-2.74 pct)

~~~~~~~~~~~~
ycsb-mongodb
~~~~~~~~~~~~

NPS1

sched-tip:      303718.33 (var: 1.31)
SIS_UTIL:       303529.33 (var: 0.67)    (-0.06%)

NPS2

sched-tip:      304536.33 (var: 2.46)
SIS_UTIL:       303730.33 (var: 1.57)    (-0.26%)

NPS4

sched-tip:      301192.33 (var: 1.81)
SIS_UTIL:       300101.33 (var: 0.35)   (-0.36%)

~~~~~~~~~~~~~~~~~~

Notes:

- There seems to be some noticeable regression for hackbench
  with 16 groups in NPS1 mode.
- There seems to be regression in tbench for case with number
  of workers in range 32-128 (12.5% loaded to 50% loaded)
- tbench reaches saturation early when system is fully loaded

This probably show that the strategy in the initial v1 RFC
seems to work better with our LLC where number of CPUs per LLC
is low compared to systems with unified LLC. Given this is
showing great results for unified LLC, maybe SIS_PROP and SIS_UTIL
can be enabled based on the the size of LLC.

> [..snip..]
>
> [3]
> Prateek mentioned that we should scan aggressively in an LLC domain
> with 16 CPUs. Because the cost to search for an idle one among 16 CPUs is
> negligible. The current patch aims to propose a generic solution and only
> considers the util_avg. A follow-up change could enhance the scan policy
> to adjust the scan_percent according to the CPU number in LLC.

Following are some additional numbers I would like to share comparing SIS_PROP and
SIS_UTIL:

o Hackbench with 1 group

With 1 group, following are the chances of SIS_PROP
and SIS_UTIL finding an idle CPU when an idle CPU
exists in LLC:

+-----------------+---------------------------+---------------------------+--------+
| Idle CPU in LLC | SIS_PROP able to find CPU | SIS_UTIL able to find CPU | Count  |
+-----------------+---------------------------+---------------------------+--------+
|        1        |             0             |             0             | 66444  |
|        1        |             0             |             1             | 34153  |
|        1        |             1             |             0             | 57204  |
|        1        |             1             |             1             | 119263 |
+-----------------+---------------------------+---------------------------+--------+

SIS_PROP vs no SIS_PROP CPU search stats:

Total time without SIS_PROP: 90653653
Total time with SIS_PROP: 53558942 (-40.92 pct)
Total time saved: 37094711

Following are number of CPUs SIS_UTIL will search when SIS_PROP limit >= 16 (LLC size):

+--------------+-------+
| CPU Searched | Count |
+--------------+-------+
|      0       | 10520 |
|      1       |  7770 |
|      2       | 11976 |
|      3       | 17554 |
|      4       | 13932 |
|      5       | 15051 |
|      6       |  8398 |
|      7       |  4544 |
|      8       |  3712 |
|      9       |  2337 |
|      10      |  4541 |
|      11      |  1947 |
|      12      |  3846 |
|      13      |  3645 |
|      14      |  2686 |
|      15      |  8390 |
|      16      | 26157 |
+--------------+-------+

- SIS_UTIL might be bailing out too early in some of these cases.

o Hackbench with 16 group

the success rate looks as follows:

+-----------------+---------------------------+---------------------------+---------+
| Idle CPU in LLC | SIS_PROP able to find CPU | SIS_UTIL able to find CPU |  Count  |
+-----------------+---------------------------+---------------------------+---------+
|        1        |             0             |             0             | 1313745 |
|        1        |             0             |             1             |  694132 |
|        1        |             1             |             0             | 2888450 |
|        1        |             1             |             1             | 5343065 |
+-----------------+---------------------------+---------------------------+---------+

SIS_PROP vs no SIS_PROP CPU search stats:

Total time without SIS_PROP: 5227299388
Total time with SIS_PROP: 3866575188 (-26.03 pct)
Total time saved: 1360724200

Following are number of CPUs SIS_UTIL will search when SIS_PROP limit >= 16 (LLC size):

+--------------+---------+
| CPU Searched |  Count  |
+--------------+---------+
|      0       |  150351 |
|      1       |  105116 |
|      2       |  214291 |
|      3       |  440053 |
|      4       |  914116 |
|      5       | 1757984 |
|      6       | 2410484 |
|      7       | 1867668 |
|      8       |  379888 |
|      9       |  84055  |
|      10      |  55389  |
|      11      |  26795  |
|      12      |  43113  |
|      13      |  24579  |
|      14      |  32896  |
|      15      |  70059  |
|      16      |  150858 |
+--------------+---------+

- SIS_UTIL might be bailing out too early in most of these cases

o tbench with 256 workers

For tbench with 256 threads, SIS_UTIL works great as we have drastically cut down the number
of CPUs to search.

SIS_PROP vs no SIS_PROP CPU search stats:

Total time without SIS_PROP: 64004752959
Total time with SIS_PROP: 34695004390 (-45.79 pct)
Total time saved: 29309748569

Following are number of CPUs SIS_UTIL will search when SIS_PROP limit >= 16 (LLC size):

+--------------+----------+
| CPU Searched |  Count   |
+--------------+----------+
|      0       |  500077  |
|      1       |  543865  |
|      2       | 4257684  |
|      3       | 27457498 |
|      4       | 40208673 |
|      5       | 3264358  |
|      6       |  191631  |
|      7       |  24658   |
|      8       |   2469   |
|      9       |   1374   |
|      10      |   2008   |
|      11      |   1300   |
|      12      |   1226   |
|      13      |   1179   |
|      14      |   1631   |
|      15      |  11678   |
|      16      |   7793   |
+--------------+----------+

- This is where SIS_UTIL shines for tbench case with 256 workers as it is effective
  at restricting search space well.

o Observations

SIS_PROP seems to have a higher chance of finding an idle CPU compared to SIS_UTIL
in case of hackbench with 16-group. The gap between SIS_PROP and SIS_UTIL is wider
with 16 groups compared to than with 1 group.
Also SIS_PROP is more aggressive at saving time for 1-group compared to the
case with 16-groups.

The bailout from SIS_UTIL is fruitful for tbench with 256 workers leading to massive
performance gain in a fully loaded system.

Note: There might be some inaccuracies for the numbers presented for metrics that
directly compare SIS_PROP and SIS_UTIL as both SIS_PROP and SIS_UTIL were enabled
when gathering these data points and the results from SIS_PROP were returned from
search_idle_cpu(). All the numbers for the above analysis were gathered in NPS1 mode.

> [..snip..]

--
Thanks and Regards,
Prateek