lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 12 Jun 2024 08:25:22 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Kajetan Puchalski <kajetan.puchalski@....com>, rafael@...nel.org,
 daniel.lezcano@...aro.org, Dietmar.Eggemann@....com, dsmythies@...us.net,
 yu.chen.surf@...il.com, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
 Ulf Hansson <ulf.hansson@...aro.org>, Qais Yousef <qyousef@...alina.io>
Subject: Re: [PATCH v6 2/2] cpuidle: teo: Introduce util-awareness

Hi Vincent,

My apologies for delay, I was on sick leave.

On 5/28/24 15:07, Vincent Guittot wrote:
> On Tue, 28 May 2024 at 11:59, Lukasz Luba <lukasz.luba@....com> wrote:
>>
>> Hi Vincent,
>>
>> On 5/28/24 10:29, Vincent Guittot wrote:
>>> Hi All,
>>>
>>> I'm quite late on this thread but this patchset creates a major
>>> regression for psci cpuidle driver when using the OSI mode (OS
>>> initiated mode).  In such a case, cpuidle driver takes care only of
>>> CPUs power state and the deeper C-states ,which includes cluster and
>>> other power domains, are handled with power domain framework. In such
>>> configuration ,cpuidle has only 2 c-states : WFI and cpu off states
>>> and others states that include the clusters, are managed by genpd and
>>> its governor.
>>>
>>> This patch selects cpuidle c-state N-1 as soon as the utilization is
>>> above CPU capacity / 64 which means at most a level of 16 on the big
>>> core but can be as low as 4 on little cores. These levels are very low
>>> and the main result is that as soon as there is very little activity
>>> on a CPU, cpuidle always selects WFI states whatever the estimated
>>> sleep duration and which prevents any deeper states. Another effect is
>>> that it also keeps the tick firing every 1ms in my case.
>>
>> Thanks for reporting this.
>> Could you add what regression it's causing, please?
>> Performance or higher power?
> 
> It's not a perf but rather a power regression. I don't have a power
> counter so it's difficult to give figures but I found it while running
> a unitary test below on my rb5:
> run 500us every 19457ms on medium core (uclamp_min: 600).

Mid cores are built differently, they have low static power (leakage).
Therefore, for them the residency in deeper idle state should be
longer than for Big CPU. When you power off the CPU you loose your
cache data/code. The data needs to be stored in the L3 or
further memory. When the cpu is powered on again, it needs code & data.
Thus, it will transfer that data/code from L3 or from DDR. That
information transfer has energy cost (it's not for free). The cost
of data from DDR is very high.
Then we have to justify if the energy lost while sleeping in shallower
idle state can be higher than loading data/code from outside.
For different CPU it would be different.

> 
> With this use case, the idle time is more than 18ms (the 500us becomes
> 1ms as we don't run at max capacity) but the tick fires every 1ms
> while the system is fully idle (all 8 cpus are idle) and as cpuidle
> selects WFI, it prevents the full cluster power down. So even if WFI
> is efficient, the power impact should be significant.

I would say it's a problem of the right threshold. In this situation
the tick would be bigger issue IMO.

Because you don't have energy meter on that board, it's hard to say
if the power impact is significant.

Let's estimate something, when the system is not much loaded:
Mig CPU often has low freq at ~300-400MHz and Energy Model power
~for that OPP is ~30mW.
If you are loaded in e.g. 1% at lowest frequency than your
avg power would be ~0.3mW, so ~1mW would be at ~3% load for
that frequency. That's dynamic power if you need to serve some IRQ,
like the tick.
The static power would be ~5% of total power (for these low-power
cells in Mid core) of this ~30mW, so something ~1.5mW.
I wouldn't say it's significant, it's some small power which might
be tackled.

This is when the system is not much loaded. When it's loaded then
we might pick higher OPP for the Mid cluster, but also quite often
get tasks in those CPUs. Then the WFI is better in such situations.

> 
> For a 5 sec test duration, the system doesn't spend any time in
> cluster power down state with this patch but spent 3.9 sec in cluster
> power down state without

I think this can be achieved with just changing the thresholds.

Regards,
Lukasz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ