linux-kernel - Re: [PATCH v2 0/3] cpufreq: Allow drivers to receive more information from the governor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1608307905.26567.46.camel@suse.com>
Date:   Fri, 18 Dec 2020 17:11:45 +0100
From:   Giovanni Gherdovich <ggherdovich@...e.com>
To:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux PM <linux-pm@...r.kernel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Doug Smythies <dsmythies@...us.net>
Subject: Re: [PATCH v2 0/3] cpufreq: Allow drivers to receive more
 information from the governor

On Mon, 2020-12-14 at 21:01 +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> The timing of this is not perfect (sorry about that), but here's a refresh
> of this series.
> 
> The majority of the previous cover letter still applies:
> [...]

Hello,

the series is tested using

-> tbench (packets processing with loopback networking, measures throughput)
-> dbench (filesystem operations, measures average latency)
-> kernbench (kernel compilation, elapsed time)
-> and gitsource (long-running shell script, elapsed time)

These are chosen because none of them is bound by compute and all are
sensitive to freq scaling decisions. The machines are a Cascade Lake based
server, a client Skylake and a Coffee Lake laptop.

What's being compared:

sugov-HWP.desired : the present series;  intel_pstate=passive,  governor=schedutil
sugov-HWP.min     : mainline;            intel_pstate=passive,  governor=schedutil
powersave-HWP     : mainline;            intel_pstate=active,   governor=powersave
perfgov-HWP       : mainline;            intel_pstate=active,   governor=performance
sugov-no-HWP      : HWP disabled;        intel_pstate=passive,  governor=schedutil

Dbench and Kernbench have neutral results, but Tbench has sugov-HWP.desired
lose in both performance and performance-per-watt, while Gitsource show the
series as faster in raw performance but again worse than the competition in
efficiency.

1. SUMMARY BY BENCHMARK
   1.1. TBENCH
   1.2. DBENCH
   1.3. KERNBENCH
   1.4. GITSOURCE
2. SUMMARY BY USER PROFILE
   2.1. PERFORMANCE USER: what if I switch pergov -> schedutil?
   2.2. DEFAULT USER: what if I switch powersave -> schedutil?
   2.3. DEVELOPER: what if I switch sugov-HWP.min -> sugov-HWP.desired?
3. RESULTS TABLES
   PERFORMANCE RATIOS
   PERFORMANCE-PER-WATT RATIOS


1. SUMMARY BY BENCHMARK
~~~~~~~~~~~~~~~~~~~~~~~

Tbench: sugov-HWP.desired is the worst performance on all three
    machines. sugov-HWP.min is between 20% and 90% better. The baseline
    sugov-HWP.desired offers a lower throughput, but does it increase
    efficiency? It actually doesn't: on two out of three machines the
    incumbent code (current sugov, or intel_pstate=active) has 10% to 35%
    better efficiency. In other word, the status quo is both faster and more
    efficient than the proposed series on this benchmark.
    The absolute power consumption is lower, but the delivered performance is
    "even more lower", and that's why performance-per-watt shows a net loss.

Dbench: generally neutral, in both performance and efficiency. Powersave is
    occasionally behind the pack in performance, 5% to 15%. A 15% performance
    loss on the Coffe Lake is compensated by an 80% improved efficiency. To be
    noted that on the same Coffee Lake sugov-no-HWP is 20% ahead of the pack
    in efficiency.

Kernbench: neutral, in both performance and efficiency. powersave looses 14%
    to the pack in performance on the Cascade Lake.

Gitsource: this test show the most compelling case against the
    sugov-HWP.desired series: on the Cascade Lake sugov-HWP.desired is 10%
    faster than sugov-HWP.min (it was expected to be slower!) and 35% less
    efficient (we expected more performance-per-watt, not less).


2. SUMMARY BY USER PROFILE
~~~~~~~~~~~~~~~~~~~~~~~~~~

If I was a perfgov-HWP user, I would be 20%-90% faster than with other governors
on tbench an gitsource. This speed gap comes with an unexpected efficiency
bonus on both test. Since dbench and kernbench have a flat profile across the
board, there is no incentive to try another governor.

If I was a powersave-HWP user, I'd be the slower of the bunch. The lost
performance is not, in general, balanced by better efficiency. This only
happens on Coffee Lake, which is a CPU for the mobile market and possibly HWP
has efficiency-oriented tuning there. Any flavor of schedutil would be an
improvement.

>From a developer perspective, the obstacles to move from HWP.min to
HWP.desired are tbench, where HWP.desired is worse than having no HWP support
at all, and gitsource, where HWP.desired has the opposite properties than
those advertised (it's actually faster but less efficient).


3. RESULTS TABLES
~~~~~~~~~~~~~~~~~

Tilde (~) means the result is the same as baseline (or, the ratio is close to 1).
The double asterisk (**) is a visual aid and means the result is better than
baseline (higher or lower depending on the case).


| 80x_CASCADELAKE_NUMA: Intel Cascade Lake, 40 cores / 80 threads, NUMA, SATA SSD storage
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|            sugov-HWP.des  sugov-HWP.min  powersave-HWP  perfgov-HWP  sugov-no-HWP   better if
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|                                         PERFORMANCE RATIOS
| tbench         1.00           1.89**         1.88**        1.89**        1.17**       higher
| dbench         1.00           ~              1.06          ~             ~            lower 
| kernbench      1.00           ~              1.14          ~             ~            lower 
| gitsource      1.00           1.11           2.70          0.80**        ~            lower 
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|                                    PERFORMANCE-PER-WATT RATIOS
| tbench         1.00           1.36**         1.38**        1.33**        1.04**       higher
| dbench         1.00           ~              ~             ~             ~            higher
| kernbench      1.00           ~              ~             ~             ~            higher
| gitsource      1.00           1.36**         0.63          1.22**        1.02**       higher


| 8x_COFFEELAKE_UMA: Intel Coffee Lake, 4 cores / 8 threads, UMA, NVMe SSD storage
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|            sugov-HWP.des  sugov-HWP.min  powersave-HWP  perfgov-HWP  sugov-no-HWP   better if
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|                                         PERFORMANCE RATIOS
| tbench         1.00           1.27**         1.30**        1.30**        1.31**       higher
| dbench         1.00           ~              1.15          ~             ~            lower 
| kernbench      1.00           ~              ~             ~             ~            lower 
| gitsource      1.00           ~              2.09          ~             ~            lower 
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|                                    PERFORMANCE-PER-WATT RATIOS
| tbench         1.00           ~              ~             ~             ~            higher
| dbench         1.00           ~              1.82**        ~             1.22**       higher
| kernbench      1.00           ~              ~             ~             ~            higher
| gitsource      1.00           ~              1.56**        ~             1.17**       higher


| 8x_SKYLAKE_UMA: Intel Skylake (client), 4 cores / 8 threads, UMA, SATA SSD storage
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|            sugov-HWP.des  sugov-HWP.min  powersave-HWP  perfgov-HWP  sugov-no-HWP   better if
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|                                         PERFORMANCE RATIOS
| tbench         1.00           1.21**         1.22**        1.20**        1.06**       higher
| dbench         1.00           ~              ~             ~             ~            lower 
| kernbench      1.00           ~              ~             ~             ~            lower 
| gitsource      1.00           ~              1.71          0.96**        ~            lower 
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|                                    PERFORMANCE-PER-WATT RATIOS
| tbench         1.00           1.11**         1.12**        1.10**        1.03**       higher
| dbench         1.00           ~              ~             ~             ~            higher
| kernbench      1.00           ~              ~             ~             ~            higher
| gitsource      1.00           ~              0.75          ~             ~            higher



Giovanni