linux-kernel - Re: [PATCH 4/4] cpufreq: intel_pstate: enable boost for Skylake Xeon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180728123639.7ckv3ljnei3urn6m@techsingularity.net>
Date:   Sat, 28 Jul 2018 13:36:39 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Francisco Jerez <currojerez@...eup.net>
Cc:     Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        lenb@...nel.org, rjw@...ysocki.net, peterz@...radead.org,
        ggherdovich@...e.cz, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
        viresh.kumar@...aro.org, Chris Wilson <chris@...is-wilson.co.uk>,
        Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Eero Tamminen <eero.t.tamminen@...el.com>
Subject: Re: [PATCH 4/4] cpufreq: intel_pstate: enable boost for Skylake Xeon

On Fri, Jul 27, 2018 at 10:34:03PM -0700, Francisco Jerez wrote:
> Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com> writes:
> 
> > Enable HWP boost on Skylake server and workstations.
> >
> 
> Please revert this series, it led to significant energy usage and
> graphics performance regressions [1].  The reasons are roughly the ones
> we discussed by e-mail off-list last April: This causes the intel_pstate
> driver to decrease the EPP to zero when the workload blocks on IO
> frequently enough, which for the regressing benchmarks detailed in [1]
> is a symptom of the workload being heavily IO-bound, which means they
> won't benefit at all from the EPP boost since they aren't significantly
> CPU-bound, and they will suffer a decrease in parallelism due to the
> active CPU core using a larger fraction of the TDP in order to achieve
> the same work, causing the GPU to have a lower power budget available,
> leading to a decrease in system performance.

It slices both ways. With the series, there are large boosts to
performance on other workloads where a slight increase in power usage is
acceptable in exchange for performance. For example,

Single socket skylake running sqlite
                                 v4.17               41ab43c9
Min       Trans     2580.85 (   0.00%)     5401.58 ( 109.29%)
Hmean     Trans     2610.38 (   0.00%)     5518.36 ( 111.40%)
Stddev    Trans       28.08 (   0.00%)      208.90 (-644.02%)
CoeffVar  Trans        1.08 (   0.00%)        3.78 (-251.57%)
Max       Trans     2648.02 (   0.00%)     5992.74 ( 126.31%)
BHmean-50 Trans     2629.78 (   0.00%)     5643.81 ( 114.61%)
BHmean-95 Trans     2620.38 (   0.00%)     5538.32 ( 111.36%)
BHmean-99 Trans     2620.38 (   0.00%)     5538.32 ( 111.36%)

That's over doubling the transactions per second for that workload.

Two-socket skylake running dbench4
                                v4.17               41ab43c9
Amean      1         40.85 (   0.00%)       14.97 (  63.36%)
Amean      2         42.31 (   0.00%)       17.33 (  59.04%)
Amean      4         53.77 (   0.00%)       27.85 (  48.20%)
Amean      8         68.86 (   0.00%)       43.78 (  36.42%)
Amean      16        82.62 (   0.00%)       56.51 (  31.60%)
Amean      32       135.80 (   0.00%)      116.06 (  14.54%)
Amean      64       737.51 (   0.00%)      701.00 (   4.95%)
Amean      512    14996.60 (   0.00%)    14755.05 (   1.61%)

This is reporting the average latency of operations running dbench. The
series over halves the latencies. There are many examples of basic
workloads that benefit heavily from the series and while I accept it may
not be universal, such as the case where the graphics card needs the power
and not the CPU, a straight revert is not the answer. Without the series,
HWP cripplies the CPU.

-- 
Mel Gorman
SUSE Labs