[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46f92f23aaff45a67a95aa8c83e3cf00bb96d3cf.camel@linux.ibm.com>
Date: Mon, 21 Oct 2024 13:10:17 +0530
From: Aboorva Devarajan <aboorvad@...ux.ibm.com>
To: Christian Loehle <christian.loehle@....com>, rafael@...nel.org,
daniel.lezcano@...aro.org, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: gautam@...ux.ibm.com
Subject: Re: [PATCH 0/1] cpuidle/menu: Address performance drop from
favoring physical over polling cpuidle state
On Thu, 2024-09-19 at 10:43 +0100, Christian Loehle wrote:
> On 9/19/24 09:49, Christian Loehle wrote:
> > On 9/19/24 06:02, Aboorva Devarajan wrote:
> > > On Wed, 2024-08-21 at 11:55 +0100, Christian Loehle wrote:
> > > Timer based wakeup:
> > >
> > > +------------------------+---------------------+---------------------+
> > > > Metric | Without Patch | With Patch |
> > > +------------------------+---------------------+---------------------+
> > > > Wakee thread - CPU | 110 | 110 |
> > > > Waker thread - CPU | 20 | 20 |
> > > > Sleep Interval | 50 us | 50 us |
> > > > Total Wakeups | - | - |
> > > > Avg Wakeup Latency | - | - |
> > > > Actual Sleep time | 52.639 us | 52.627 us |
> > > +------------------------+---------------------+---------------------+
> > > > Idle State 0 Usage Diff| 94,879 | 94,879 |
> > > > Idle State 0 Time Diff | 4,700,323 ns | 4,697,576 ns |
> > > > Idle State 1 Usage Diff| 0 | 0 |
> > > > Idle State 1 Time Diff | 0 ns | 0 ns |
> > > +------------------------+---------------------+---------------------+
> > > > Total Above Usage | 0 (0.00%) | (0.00%) |
> > > +------------------------+---------------------+---------------------+
> > > > Total Below Usage | 0 (0.00%) | 0 (0.00%) |
> > > +------------------------+---------------------+---------------------+
> > >
> > > In timer-based wakeups, the menu governor effectively predicts the idle
> > > duration both with and without the patch. This ensures that there are
> > > few or no instances of "Above" usage, allowing the CPU to remain in the
> > > correct idle state.
> > >
> > > The condition (s->target_residency_ns <= data->next_timer_ns) in the menu
> > > governor ensures that a physical idle state is not prioritized when a
> > > timer event is expected before the target residency of the first physical
> > > idle state.
> > >
> > > As a result, the patch has no impact in this case, and performance
> > > remains stable with timer based wakeups.
> > >
> > > Pipe based wakeup (non-timer wakeup):
> > >
> > > +------------------------+---------------------+---------------------+
> > > > Metric | Without Patch | With Patch |
> > > +------------------------+---------------------+---------------------+
> > > > Wakee thread - CPU | 110 | 110 |
> > > > Waker thread - CPU | 20 | 20 |
> > > > Sleep Interval | 50 us | 50 us |
> > > > Total Wakeups | 97031 | 96583 |
> > > > Avg Wakeup Latency | 7.070 us | 4.879 us |
> > > > Actual Sleep time | 51.366 us | 51.605 us |
> > > +------------------------+---------------------+---------------------+
> > > > Idle State 0 Usage Diff| 1209 | 96,586 |
> > > > Idle State 0 Time Diff | 55,579 ns | 4,510,003 ns |
> > > > Idle State 1 Usage Diff| 95,826 | 5 |
> > > > Idle State 1 Time Diff | 4,522,639 ns | 198 ns |
> > > +------------------------+---------------------+---------------------+
> > > +------------------------+---------------------+---------------------+
> > > > **Total Above Usage** | 95,824 (98.75%) | 5 (0.01%) |
> > > +------------------------+---------------------+---------------------+
> > > +------------------------+---------------------+---------------------+
> > > > Total Below Usage | 0 (0.00%) | 0 (0.00%) |
> > > +------------------------+---------------------+---------------------+
> > >
> > > In the pipe-based wakeup scenario, before the patch was applied, the
> > > "Above" metric was notably high around 98.75%. This suggests that the
> > > menu governor frequently selected a deeper idle state like CEDE, even
> > > when the idle period was relatively short.
> > >
> > > This happened because the menu governor is inclined to prioritize the
> > > physical idle state (CEDE) even when the target residency time of the
> > > physical idle state (s->target_residency_ns) was longer than the
> > > predicted idle time (predicted_ns), so data->next_timer_ns won't be
> > > relevant here in non-timer wakeups.
> > >
> > > In this test, despite the actual idle period being around 50 microseconds,
> > > the menu governor still chose CEDE state, which has a target residency of
> > > 120 microseconds.
> >
> > And the idle_miss statistics confirms that this was mostly wrong decisions
> > in hindsight.
> > I'll go through the menu code again, this indeed shouldn't be happening.
> > I'd be very surprised if upstream teo performed as badly (or actually badly
> > at all) on this, although it doesn't seem to help your actual workload,
> > would you mind giving that a try too?
> >
> > Regards,
> > Christian
>
> So with a workload as static as this, the recent interval detection of menu
> should take affect. It records the last 8 idle interval lengths and does some
> statistics do see if they are somewhat stable and uses those instead of the
> next timer event.
> While I wouldn't vouch for the statistics just yet, the results don't seem
> to be completely unreasonable.
> Do you mind trying
> a) printing some snapshots of these intervals and the
> resulting return value of get_typical_interval()?
Christian,
Here are the stats I get when printing the intervals and corresponding predicted_ns
for an idle duration of 50 us set from user space. In this case, a pipe wakeup with
a 50 us sleep time in user space translates to approximately 34 us of CPU idle time
in the kernel, with very little deviation.
-------------------------------------------------------------------------------------
process cpu timestamp trace
<idle>-0 [200] d.... 123.969564: menu_select: CPU[200]: Interval[2] = 34
<idle>-0 [200] d.... 123.969564: menu_select: CPU[200]: Interval[3] = 34
<idle>-0 [200] d.... 123.969564: menu_select: CPU[200]: Interval[4] = 34
<idle>-0 [200] d.... 123.969564: menu_select: CPU[200]: Interval[5] = 34
<idle>-0 [200] d.... 123.969565: menu_select: CPU[200]: Interval[6] = 34
<idle>-0 [200] d.... 123.969565: menu_select: CPU[200]: Interval[7] = 34
<idle>-0 [200] d.... 123.969565: menu_select: CPU[200]: Predicted ns = 34000
<idle>-0 [200] d.... 123.969615: menu_select: CPU[200]: Interval[0] = 34
<idle>-0 [200] d.... 123.969616: menu_select: CPU[200]: Interval[1] = 34
<idle>-0 [200] d.... 123.969616: menu_select: CPU[200]: Interval[2] = 34
<idle>-0 [200] d.... 123.969616: menu_select: CPU[200]: Interval[3] = 34
<idle>-0 [200] d.... 123.969616: menu_select: CPU[200]: Interval[4] = 34
<idle>-0 [200] d.... 123.969616: menu_select: CPU[200]: Interval[5] = 34
<idle>-0 [200] d.... 123.969616: menu_select: CPU[200]: Interval[6] = 34
<idle>-0 [200] d.... 123.969616: menu_select: CPU[200]: Interval[7] = 34
<idle>-0 [200] d.... 123.969617: menu_select: CPU[200]: Predicted ns = 34000
...
...
...
...
<idle>-0 [200] d.... 123.969667: menu_select: CPU[200]: Interval[0] = 34
<idle>-0 [200] d.... 123.969667: menu_select: CPU[200]: Interval[1] = 34
<idle>-0 [200] d.... 123.969667: menu_select: CPU[200]: Interval[2] = 34
<idle>-0 [200] d.... 123.969667: menu_select: CPU[200]: Interval[3] = 34
<idle>-0 [200] d.... 123.969667: menu_select: CPU[200]: Interval[4] = 34
<idle>-0 [200] d.... 123.969668: menu_select: CPU[200]: Interval[5] = 34
<idle>-0 [200] d.... 123.969668: menu_select: CPU[200]: Interval[6] = 34
<idle>-0 [200] d.... 123.969668: menu_select: CPU[200]: Interval[7] = 33
<idle>-0 [200] d.... 123.969668: menu_select: CPU[200]: Predicted ns = 33000
This pattern repeats until I stop the workload.
----------------------------------------------------------------------
---------------
> b) Trying this out? Setting these values to some around 50us, that returns
> 50us for me as expected and therefore should not select an idle state above
> that.
>
> -->8--
>
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -112,6 +112,8 @@ struct menu_device {
> int interval_ptr;
> };
>
> +static int intervals_dummy[] = {50, 52, 48, 50, 51, 49, 51, 51};
> +
> static inline int which_bucket(u64 duration_ns, unsigned int nr_iowaiters)
> {
> int bucket = 0;
> @@ -177,7 +179,7 @@ static unsigned int get_typical_interval(struct menu_device *data)
> sum = 0;
> divisor = 0;
> for (i = 0; i < INTERVALS; i++) {
> - unsigned int value = data->intervals[i];
> + unsigned int value = intervals_dummy[i];
> if (value <= thresh) {
> sum += value;
> divisor++;
> @@ -200,7 +202,7 @@ static unsigned int get_typical_interval(struct menu_device *data)
> /* Then try to determine variance */
> variance = 0;
> for (i = 0; i < INTERVALS; i++) {
> - unsigned int value = data->intervals[i];
> + unsigned int value = intervals_dummy[i];
> if (value <= thresh) {
> int64_t diff = (int64_t)value - avg;
> variance += diff * diff;
>
Yes, I tried the above patch and I still see the issue:
+-----------------------------+------------------+------------------+------------------+
| Metric | ladder | menu | teo |
+=============================+==================+==================+==================+
| Wakeups | 579817 | 579298 | 578359 |
+-----------------------------+------------------+------------------+------------------+
| Average wakeup latency (us) | 7.32 | 7.099 | 4.538 |
+-----------------------------+------------------+------------------+------------------+
| Sleep interval (us) | 51.722 | 51.768 | 51.852 |
+-----------------------------+------------------+------------------+------------------+
| State 0 Usage diff | 0 | 7056 | 578402 |
+-----------------------------+------------------+------------------+------------------+
| State 0 Time diff (ns) | 0 | 350579 | 2.75562e+07 |
+-----------------------------+------------------+------------------+------------------+
| State 0 Above diff | 0 | 0 | 0 |
+-----------------------------+------------------+------------------+------------------+
| State 0 Below diff | 0 | 1 | 0 |
+-----------------------------+------------------+------------------+------------------+
| State 1 Usage diff | 580239 | 572168 | 0 |
+-----------------------------+------------------+------------------+------------------+
| State 1 Time diff (ns) | 2.74763e+07 | 2.73917e+07 | 0 |
+-----------------------------+------------------+------------------+------------------+
| State 1 Above diff | 580140 | 572070 | 0 |
+-----------------------------+------------------+------------------+------------------+
| State 1 Below diff | 0 | 0 | 0 |
+-----------------------------+------------------+------------------+------------------+
| Total Above | 580140 | 572070 | 0 |
+-----------------------------+------------------+------------------+------------------+
| Total Below | 0 | 1 | 0 |
+=============================+==================+==================+==================+
| Total Above Percent | 99.98 | 98.76 | 0 |
+-----------------------------+------------------+------------------+------------------+
| Total Below Percent | 0 | 0 | 0 |
+======================================================================================+
--------------------------------------------------------------------------------------------
Note:
% Total Above Percent (w.r.t. Total Usage) = (Total Above / Total Usage) * 100
% Total Below Percent (w.r.t. Total Usage) = (Total Below / Total Usage) * 100
--------------------------------------------------------------------------------------------
This confirms that the issue is not occuring due to the way in which predicted_ns is
computed in get_typicial_interval, but rather related to the menu governor's CPU idle
state selection logic.
Thanks,
Aboorva
Powered by blists - more mailing lists