linux-kernel - Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Thu, 4 Apr 2019 16:40:43 +0530
From:   Abhishek <huntbag@...ux.vnet.ibm.com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Michael Ellerman <mpe@...erman.id.au>, ego@...ux.vnet.ibm.com
Subject: Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states



On 04/04/2019 03:51 PM, Daniel Lezcano wrote:
> Hi Abhishek,
>
> thanks for taking the time to test the different scenario and give us
> the numbers.
>
> On 01/04/2019 07:11, Abhishek wrote:
>>
>> On 03/22/2019 06:56 PM, Daniel Lezcano wrote:
>>> On 22/03/2019 10:45, Rafael J. Wysocki wrote:
>>>> On Fri, Mar 22, 2019 at 8:31 AM Abhishek Goel
>>>> <huntbag@...ux.vnet.ibm.com> wrote:
>>>>> Currently, the cpuidle governors (menu /ladder) determine what idle
>>>>> state
>>>>> an idling CPU should enter into based on heuristics that depend on the
>>>>> idle history on that CPU. Given that no predictive heuristic is
>>>>> perfect,
>>>>> there are cases where the governor predicts a shallow idle state,
>>>>> hoping
>>>>> that the CPU will be busy soon. However, if no new workload is
>>>>> scheduled
>>>>> on that CPU in the near future, the CPU will end up in the shallow
>>>>> state.
>>>>>
>>>>> In case of POWER, this is problematic, when the predicted state in the
>>>>> aforementioned scenario is a lite stop state, as such lite states will
>>>>> inhibit SMT folding, thereby depriving the other threads in the core
>>>>> from
>>>>> using the core resources.
> I can understand an idle state can prevent other threads to use the core
> resources. But why a deeper idle state does not prevent this also?
>
>
>>>>> To address this, such lite states need to be autopromoted. The cpuidle-
>>>>> core can queue timer to correspond with the residency value of the next
>>>>> available state. Thus leading to auto-promotion to a deeper idle
>>>>> state as
>>>>> soon as possible.
>>>> Isn't the tick stopping avoidance sufficient for that?
>>> I was about to ask the same :)
>>>
>>>
>>>
>>>
>> Thanks for the review.
>> I performed experiments for three scenarios to collect some data.
>>
>> case 1 :
>> Without this patch and without tick retained, i.e. in a upstream kernel,
>> It would spend more than even a second to get out of stop0_lite.
>>
>> case 2 : With tick retained(as suggested) -
>>
>> Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
>> it to take 8 sched tick to get out of stop0_lite. Experimentally,
>> observation was
>>
>> ===================================
>> min            max            99percentile
>> 4ms            12ms          4ms
>> ===================================
>> *ms = milliseconds
>>
>> It would take atleast one sched tick to get out of stop0_lite.
>>
>> case 2 :  With this patch (not stopping tick, but explicitly queuing a
>> timer)
>>
>> min            max              99.5percentile
>> ===============================
>> 144us       192us              144us
>> ===============================
>> *us = microseconds
>>
>> In this patch, we queue a timer just before entering into a stop0_lite
>> state. The timer fires at (residency of next available state + exit
>> latency of next available state * 2).
> So for the context, we have a similar issue but from the power
> management point of view where a CPU can stay in a shallow state for a
> long period, thus consuming a lot of energy.
>
> The window was reduced by preventing stopping the tick when a shallow
> state is selected. Unfortunately, if the tick is stopped and we
> exit/enter again and we select a shallow state, the situation is the same.
>
> A solution was previously proposed with a timer some years ago, like
> this patch does, and merged but there were complains about bad
> performance impact, so it has been reverted.
>
>> Let's say if next state(stop0) is available which has residency of 20us, it
>> should get out in as low as (20+2*2)*8 [Based on the forumla (residency +
>> 2xlatency)*history length] microseconds = 192us. Ideally we would expect 8
>> iterations, it was observed to get out in 6-7 iterations.
> Can you explain the formula? I don't get the rational. Why using the
> exit latency and why multiply it by 2?
>
> Why the timer is not set to the next state's target residency value ?
>
The idea behind multiplying by 2 is, entry latency + exit latency = 2* 
exit latency, i.e.,
using exit latency = entry latency
So in effect, we are using target residency + 2 * exit latency for 
timeout of timer.
Latency is generally <=10% of residency. I have tried to be conservative 
by including latency
factor in computation for timeout. Thus, this formula will give slightly 
greater value compared
to directly using residency of target state.

--Abhishek