linux-kernel - RE: [RFC/RFT][PATCH] cpufreq: intel_pstate: Work in passive mode with HWP enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <000801d632a9$6a586c90$3f0945b0$@net>
Date:   Mon, 25 May 2020 08:30:21 -0700
From:   "Doug Smythies" <dsmythies@...us.net>
To:     "'Rafael J. Wysocki'" <rjw@...ysocki.net>
Cc:     "'LKML'" <linux-kernel@...r.kernel.org>,
        "'Len Brown'" <len.brown@...el.com>,
        "'Srinivas Pandruvada'" <srinivas.pandruvada@...ux.intel.com>,
        "'Peter Zijlstra'" <peterz@...radead.org>,
        "'Giovanni Gherdovich'" <ggherdovich@...e.cz>,
        "'Linux PM'" <linux-pm@...r.kernel.org>,
        "'Francisco Jerez'" <francisco.jerez.plata@...el.com>
Subject: RE: [RFC/RFT][PATCH] cpufreq: intel_pstate: Work in passive mode with HWP enabled

On 2020.05.21 10:16 Rafael J. Wysocki wrote:

> 
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> Allow intel_pstate to work in the passive mode with HWP enabled and
> make it translate the target frequency supplied by the cpufreq
> governor in use into an EPP value to be written to the HWP request
> MSR (high frequencies are mapped to low EPP values that mean more
> performance-oriented HWP operation) as a hint for the HWP algorithm
> in the processor, so as to prevent it and the CPU scheduler from
> working against each other at least when the schedutil governor is
> in use.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
> 
> This is a prototype not intended for production use (based on linux-next).
> 
> Please test it if you can (on HWP systems, of course) and let me know the
> results.
> 
> The INTEL_CPUFREQ_TRANSITION_DELAY_HWP value has been guessed and it very well
> may turn out to be either too high or too low for the general use, which is one
> reason why getting as much testing coverage as possible is key here.
> 
> If you can play with different INTEL_CPUFREQ_TRANSITION_DELAY_HWP values,
> please do so and let me know the conclusions.

Hi all,

It is way too early for me to reply to this, as my tests are incomplete.
However, I saw the other e-mails and will chime in now.

Note before: I have been having difficulties with this test computer,
results may be suspect.

The default INTEL_CPUFREQ_TRANSITION_DELAY_HWP is definitely too short.
I do not know what an optimal value is, but I have made it the same as
INTEL_CPUFREQ_TRANSITION_LATENCY for now, and because it makes for great
test comparisons.

I have only done the simplest of tests so far, single thread, forced
CPU affinity load ramp up/down.

Load:
fixed work packet per work period, i.e. the amount of work to do is independent of CPU frequency.
347 hertz work / sleep frequency. Why? No reason. Because it is what I used last time.

To reveal any hysteresis (i.e. with conservative governor) load ramps up from none
to 100% and then back down to none at 3 seconds per step (step size is uncalibrated).

Processor:
Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz

Modes:
intel_cpufreq (i_)
intel_pstate (p_)
acpi-cpufreq (a_)

Kernel:

b955f2f4a425 (HEAD -> k57-hwp) cpufreq: intel_pstate: Work in passive mode with HWP enabled
faf93d5c9da1 cpufreq: intel_pstate: Use passive mode by default without HWP
b9bbe6ed63b2 (tag: v5.7-rc6) Linux 5.7-rc6

Graph notes: The ramp up and ramp down are folded back on the graphs. These tests
were launched manually in two terminals, so the timing was not exact.
I'll fix this moving forward. 

Legend - ondemand graph [1] (minor annotation added):

i_onde_stock : intel-cpufreq, ondemand, stock kernel (5.7-rc6), hwp disabled.
i_onde_hwp : intel-cpufreq, ondemand, patched kernel (5.7-rc6), DELAY_HWP 5000.
i_onde_hwp2 : intel-cpufreq, ondemand, patched kernel (5.7-rc6), DELAY_HWP 20000.
a_onde: acpi-cpufreq, stock kernel (5.7-rc6), intel_pstate disabled.

Conclusion:
DELAY_HWP 20000 matches the pre-patch response almost exactly.
DELAY_HWP 5000 goes too far (my opinion), but does look promising.

Legend - intel_pstate - powersave graph [2].

What? Why is there such a graph, unrelated to this patch?
Well, because there is a not yet understood effect.

p_powe_stock : intel_pstate, powersave, stock kernel (5.7-rc6), hwp disabled.
P_powe_hwp : intel_pstate, powersave, patched kernel (5.7-rc6), DELAY_HWP 5000.
P_powe_hwp2 : intel_pstate, powersave, patched kernel (5.7-rc6), DELAY_HWP 20000.

Conclusion: ??

Note: That I merely made a stupid mistake is a real possibility.
Repeat test pending.

Legend - schedutil graph [3]:

i_sche_stock : intel-cpufreq, ondemand, stock kernel (5.7-rc6), hwp disabled.
i_sche_hwp : intel-cpufreq, ondemand, patched kernel (5.7-rc6), DELAY_HWP 5000.
i_sche_hwp2 : intel-cpufreq, ondemand, patched kernel (5.7-rc6), DELAY_HWP 20000.
a_sche: acpi-cpufreq, stock kernel (5.7-rc6), intel_pstate disabled.

Conclusion: ?? I don't know, always had schedutil issues on this computer.
Expected more like typical from my older test server (i7-2600K) [4]:

[1] http://www.smythies.com/~doug/linux/intel_pstate/passive-hwp/passive-hwp-ondemand.png
[2] http://www.smythies.com/~doug/linux/intel_pstate/passive-hwp/passive-hwp-but-active-powersave.png
[3] http://www.smythies.com/~doug/linux/intel_pstate/passive-hwp/passive-hwp-schedutil.png
[4] http://www.smythies.com/~doug/linux/intel_pstate/cpu-freq-scalers/fixed-work-packet-folded-schedutil.png


> 
>  #define INTEL_CPUFREQ_TRANSITION_LATENCY	20000
> +#define INTEL_CPUFREQ_TRANSITION_DELAY_HWP	5000

For now, while trying to prove nothing is busted,
suggest 20000.

> +/**
> + * intel_cpufreq_adjust_hwp_request - Adjust the HWP reuqest register.

request.

> +static void intel_cpufreq_adjust_hwp_request(struct cpudata *cpu, s64 max_freq,
> +					     unsigned int target_freq)
> +{
> +	s64 epp_fp = div_fp(255 * (max_freq - target_freq), max_freq);
> +

I would have thought this:

	s64 epp_fp = div_fp(255 * (max_freq - target_freq), (max_freq - min_freq));

but tested the original only, so far.

... Doug