lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <592a0c06-a23c-464a-9f67-2ce8ce91a306@arm.com>
Date: Thu, 19 Sep 2024 10:43:53 +0100
From: Christian Loehle <christian.loehle@....com>
To: Aboorva Devarajan <aboorvad@...ux.ibm.com>, rafael@...nel.org,
 daniel.lezcano@...aro.org, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org
Cc: gautam@...ux.ibm.com
Subject: Re: [PATCH 0/1] cpuidle/menu: Address performance drop from favoring
 physical over polling cpuidle state

On 9/19/24 09:49, Christian Loehle wrote:
> On 9/19/24 06:02, Aboorva Devarajan wrote:
>> On Wed, 2024-08-21 at 11:55 +0100, Christian Loehle wrote:
> 
>>
>> Timer based wakeup:
>>
>> +------------------------+---------------------+---------------------+
>> | Metric                 | Without Patch       | With Patch          |
>> +------------------------+---------------------+---------------------+
>> | Wakee thread - CPU     | 110                 | 110                 |
>> | Waker thread - CPU     | 20                  | 20                  |
>> | Sleep Interval         | 50 us               | 50 us               |
>> | Total Wakeups          | -                   | -                   |
>> | Avg Wakeup Latency     | -                   | -                   |
>> | Actual Sleep time      | 52.639 us           | 52.627 us           |
>> +------------------------+---------------------+---------------------+
>> | Idle State 0 Usage Diff| 94,879              | 94,879              |
>> | Idle State 0 Time Diff | 4,700,323 ns        | 4,697,576 ns        |
>> | Idle State 1 Usage Diff| 0                   | 0                   |
>> | Idle State 1 Time Diff | 0 ns                | 0 ns                |
>> +------------------------+---------------------+---------------------+
>> | Total Above Usage      | 0 (0.00%)           | (0.00%)             |
>> +------------------------+---------------------+---------------------+
>> | Total Below Usage      | 0 (0.00%)           | 0 (0.00%)           |
>> +------------------------+---------------------+---------------------+
>>
>> In timer-based wakeups, the menu governor effectively predicts the idle
>> duration both with and without the patch. This ensures that there are
>> few or no instances of "Above" usage, allowing the CPU to remain in the
>> correct idle state.
>>
>> The condition (s->target_residency_ns <= data->next_timer_ns) in the menu
>> governor ensures that a physical idle state is not prioritized when a
>> timer event is expected before the target residency of the first physical
>> idle state.
>>
>> As a result, the patch has no impact in this case, and performance
>> remains stable with timer based wakeups.
>>
>> Pipe based wakeup (non-timer wakeup):
>>
>> +------------------------+---------------------+---------------------+
>> | Metric                 | Without Patch       | With Patch          |
>> +------------------------+---------------------+---------------------+
>> | Wakee thread - CPU     | 110                 | 110                 |
>> | Waker thread - CPU     | 20                  | 20                  |
>> | Sleep Interval         | 50 us               | 50 us               |
>> | Total Wakeups          | 97031               | 96583               |
>> | Avg Wakeup Latency     | 7.070 us            | 4.879 us            |
>> | Actual Sleep time      | 51.366 us           | 51.605 us           |
>> +------------------------+---------------------+---------------------+
>> | Idle State 0 Usage Diff| 1209                | 96,586              |
>> | Idle State 0 Time Diff | 55,579 ns           | 4,510,003 ns        |
>> | Idle State 1 Usage Diff| 95,826              | 5                   |
>> | Idle State 1 Time Diff | 4,522,639 ns        | 198 ns              |
>> +------------------------+---------------------+---------------------+
>> +------------------------+---------------------+---------------------+
>> | **Total Above Usage**  | 95,824 (98.75%)     | 5 (0.01%)           |
>> +------------------------+---------------------+---------------------+     
>> +------------------------+---------------------+---------------------+
>> | Total Below Usage      | 0 (0.00%)           | 0 (0.00%)           |
>> +------------------------+---------------------+---------------------+
>>
>> In the pipe-based wakeup scenario, before the patch was applied, the 
>> "Above" metric was notably high around 98.75%. This suggests that the
>> menu governor frequently selected a deeper idle state like CEDE, even
>> when the idle period was relatively short.
>>
>> This happened because the menu governor is inclined to prioritize the
>> physical idle state (CEDE) even when the target residency time of the
>> physical idle state (s->target_residency_ns) was longer than the
>> predicted idle time (predicted_ns), so data->next_timer_ns won't be
>> relevant here in non-timer wakeups.
>>
>> In this test, despite the actual idle period being around 50 microseconds,
>> the menu governor still chose CEDE state, which has a target residency of
>> 120 microseconds.
> 
> And the idle_miss statistics confirms that this was mostly wrong decisions
> in hindsight.
> I'll go through the menu code again, this indeed shouldn't be happening.
> I'd be very surprised if upstream teo performed as badly (or actually badly
> at all) on this, although it doesn't seem to help your actual workload,
> would you mind giving that a try too?
> 
> Regards,
> Christian

So with a workload as static as this, the recent interval detection of menu
should take affect. It records the last 8 idle interval lengths and does some
statistics do see if they are somewhat stable and uses those instead of the
next timer event.
While I wouldn't vouch for the statistics just yet, the results don't seem
to be completely unreasonable.
Do you mind trying a) printing some snapshots of these intervals and the
resulting return value of get_typical_interval()?
b) Trying this out? Setting these values to some around 50us, that returns
50us for me as expected and therefore should not select an idle state above
that.

-->8--

--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -112,6 +112,8 @@ struct menu_device {
        int             interval_ptr;
 };
 
+static int intervals_dummy[] = {50, 52, 48, 50, 51, 49, 51, 51};
+
 static inline int which_bucket(u64 duration_ns, unsigned int nr_iowaiters)
 {
        int bucket = 0;
@@ -177,7 +179,7 @@ static unsigned int get_typical_interval(struct menu_device *data)
        sum = 0;
        divisor = 0;
        for (i = 0; i < INTERVALS; i++) {
-               unsigned int value = data->intervals[i];
+               unsigned int value = intervals_dummy[i];
                if (value <= thresh) {
                        sum += value;
                        divisor++;
@@ -200,7 +202,7 @@ static unsigned int get_typical_interval(struct menu_device *data)
        /* Then try to determine variance */
        variance = 0;
        for (i = 0; i < INTERVALS; i++) {
-               unsigned int value = data->intervals[i];
+               unsigned int value = intervals_dummy[i];
                if (value <= thresh) {
                        int64_t diff = (int64_t)value - avg;
                        variance += diff * diff;


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ