[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <004d01d90218$e4631670$ad294350$@telus.net>
Date: Sat, 26 Nov 2022 20:29:51 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Rafael J. Wysocki'" <rafael@...nel.org>,
"'Kajetan Puchalski'" <kajetan.puchalski@....com>
Cc: <daniel.lezcano@...aro.org>, <lukasz.luba@....com>,
<Dietmar.Eggemann@....com>, <yu.chen.surf@...il.com>,
<linux-pm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [RFC PATCH v3 2/2] cpuidle: teo: Introduce util-awareness
On 2022.11.25 10:27 Rafael wrote:
> On Mon, Oct 31, 2022 at 1:14 PM Kajetan wrote:
... [delete some] ...
>> /*
>> * Find the deepest idle state whose target residency does not exceed
>> * the current sleep length and the deepest idle state not deeper than
>> @@ -454,6 +527,11 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>> if (idx > constraint_idx)
>> idx = constraint_idx;
>>
>> + /* if the CPU is being utilized and C1 is the selected candidate */
>> + /* choose a shallower non-polling state to improve latency */
>
> Again, the kernel coding style for multi-line comments is different
> from the above.
>
>> + if (cpu_data->utilized && idx == 1)
>
> I've changed my mind with respect to adding the idx == 1 check to
> this. If the goal is to reduce latency for the "loaded" CPUs, this
> applies to deeper idle states too.
After taking idle state 0 (POLL) out of it, the energy cost for reducing
the selected idle state by 1 was still high in some cases, at least on my
Intel processor. That was mainly for idle state 2 being bumped to idle
state 1. I don't recall significant differences bumping idle state 3 to idle
state 2, but I don't know about other Intel processors.
So, there is a trade-off here where we might want to accept this higher
energy consumption for no gain in some workflows verses the higher
energy for gain in other workflows, or not.
Example 1: Higher energy, for no benefit:
Workflow: a medium load at 211 work/sleep frequency.
This data is for one thread, but I looked at up to 6 threads.
No performance metric, the work just has to finish before
the next cycle begins.
CPU frequency scaling driver: intel_pstate
CPU frequency scaling governor: powersave
No HWP.
Kernel 6.1-rc3
teo: ~14.8 watts
util-v4 without the "idx == 1" above: 16.1 watts (+8.8%)
More info:
http://smythies.com/~doug/linux/idle/teo-util/consume/dwell-v4/
Example 2: Lower energy, but no loss in performance:
Workflow: 500 threads, light load per thread,
approximately 10 hertz work/sleep frequency per thread.
CPU frequency scaling driver: intel_cpufreq
CPU frequency scaling governor: schedutil
No HWP.
Kernel 6.1-rc3
teo: ~70 watts
util-v4 without the "idx == 1" above: ~59 watts (-16%)
Execution times were the same
More info:
http://smythies.com/~doug/linux/idle/teo-util/waiter/
Note: legend util-v4-1 is util-v4 without the "idx == 1".
I have also added util-v4-1 to some of the previous results.
For reference, my testing processor:
Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI
$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/desc
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:ACPI FFH MWAIT 0x0
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc:ACPI FFH MWAIT 0x30
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH MWAIT 0x60
... Doug
Powered by blists - more mailing lists