lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0h_u5SV-4DSgY+-mnEZS6aYquNJzrKtcrz09h3zE_7NQw@mail.gmail.com>
Date:   Wed, 23 Nov 2022 18:56:41 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Zhang Rui <rui.zhang@...el.com>
Cc:     rjw@...ysocki.net, daniel.lezcano@...aro.org,
        linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 3/3] cpuidle: ladder: Fix handling for disabled states

On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@...el.com> wrote:
>
> There are two reasons why CPU idle states may be disabled: either
> because the driver has disabled them or because they have been disabled
> by user space via sysfs.
>
> Handling for the driver-disabled state in the ladder gonover has three
> main rules.
>
> Two are introduced by commit 62d6ae880e3e ("Honor state disabling in the
> cpuidle ladder governor"). The behavior is described as below
>
> "The behavior and the effect of the disable variable depends on the
> implementation of a particular governor. In the ladder governor, for
> example, it is not coherent, i.e. if one is disabling a light state,
> then all deeper states are disabled as well, but the disable variable
> does not reflect it. Likewise, if one enables a deep state but a lighter
> state still is disabled, then this has no effect."
>
> So
> Rule 1. when promote, only checks the 'disable' flag for the next deeper
>         state. If it is disabled, consider all deeper states as disabled
>         and stick with current state.
> Rule 2. when demote, ignore the 'disable' flag of the next shallower
>         state, because when a deeper state is used, all of its shallower
>         states must be enabled.
>
> The third one is introduced by commit a2bd92023357 ("cpuidle: Do not use
> poll_idle unless user asks for it").
>
> Rule 3. never demote to POLL unless the latency requirement is 0.
>
> Handling for the sysfs-disabled state in the ladder governor is
> introduced by commit 66804c13f7b7 ("PM / cpuidle: Make ladder governor
> use the "disabled" state flag"). It copies the same logic as
> driver-disabled state, but this might break because the sysfs-disabled
> state has different definition and it can be changed at runtime.
>
> Today, when ladder governor is used, by playing with the sysfs "disable"
> attribute, the below behavior is observed.
> 1. After disabling a specific state, if the CPU was in a deeper state
>    previously, it can still request for the disabled state.
> 2. After disabling a specific state, if the CPU was in a deeper state
>    previously, it can still request for a state deeper than the disabled
>    state.
> This behavior is kept until it demotes to a state shallower than the
> disabled state, and Rule 1 starts to take effect then. The time for
> Rule 1 to take effect may be long, depending on the workload. For
> example, on an Intel Kabylake platform, disabling C1E (state 2) does not
> take effect after 30 seconds in idle scenario.
>
> 3. When all non-POLL states are disabled (or just with state1 and state2
>    disabled), the ladder governor demotes to shallower states, and
>    finally stuck in state 1 (the shallowest non-POLL state), even if the
>    state is disabled.
>
> So the previous logic for the driver-disabled state does not work well
> for the sysfs-disabled state case.
>
> Note that with commit 99e98d3fb100 ("cpuidle: Consolidate disabled state
> checks"), these two cases are combined to share one flag, thus the
> behaviors for handling the driver-disabled state and the sysfs-disabled
> state *HAVE TO* be consistent now.
>
> Now the question is what behaviors should be used?
> I'm not sure why ladder governor handles driver-disabled state
> differently than other governors. And IMO, it also conflicts with the
> expectation of the sysfs 'disable' attribute, as described in
> Documentation/admin-guide/pm/cpuidle.rst,
> "disable        Whether or not this idle state is disabled."
>
> So this patch changes the ladder governor to align with the sysfs
> 'disable' attribute definition.
> This means,
> 1. when promote, always choose the next enabled deeper state
> 2. when demote, always choose the next enabled shallower state

I agree with this.  A disabled state should just be omitted and
disabling one state should not affect the other states that have not
been disabled.

> plus, Rule 3 is kept and enhanced
> 3. Unless latency requirement is not met, never chooses POLL.
>    (Previously, unless the latency requirement is set to 0, ladder
>    governor won't choose POLL even if the shallowest non-POLL state does
>    not meet the latency requirement)

I agree with this too.

> Any comments on this would be really appreciated.
>
> Reported-by: Junxiao Chang <junxiao.chang@...el.com>
> Signed-off-by: Zhang Rui <rui.zhang@...el.com>
> ---
>  Documentation/admin-guide/pm/cpuidle.rst |   4 +-
>  drivers/cpuidle/governors/ladder.c       | 149 ++++++++++++++++++-----
>  2 files changed, 117 insertions(+), 36 deletions(-)
>
> diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
> index 19754beb5a4e..be21690211ce 100644
> --- a/Documentation/admin-guide/pm/cpuidle.rst
> +++ b/Documentation/admin-guide/pm/cpuidle.rst
> @@ -470,9 +470,7 @@ governor will never select it for this particular CPU and the ``CPUIdle``
>  driver will never ask the hardware to enter it for that CPU as a result.
>  However, disabling an idle state for one CPU does not prevent it from being
>  asked for by the other CPUs, so it must be disabled for all of them in order to
> -never be asked for by any of them.  [Note that, due to the way the ``ladder``
> -governor is implemented, disabling an idle state prevents that governor from
> -selecting any idle states deeper than the disabled one too.]
> +never be asked for by any of them.
>
>  If the :file:`disable` attribute contains 0, the given idle state is enabled for
>  this particular CPU, but it still may be disabled for some or all of the other
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index 4b47aa0a4da9..be8ddb792ad8 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -57,6 +57,78 @@ static inline void ladder_do_selection(struct cpuidle_device *dev,
>         dev->last_state_idx = new_idx;
>  }
>
> +/*
> + * Choose the first enabled shallower state that meets the latency requirement
> + */
> +static int get_shallower(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> +                        int idx, s64 latency_req, bool ignore_poll)
> +{
> +       int i;
> +
> +       /* Choose the first shallower state that meets the requirement */
> +       for (i = idx; i >= 0; i--) {
> +               if (dev->states_usage[i].disable)
> +                       continue;
> +               if (ignore_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING)
> +                       continue;
> +               if (drv->states[i].exit_latency_ns <= latency_req)
> +                       return i;
> +       }
> +
> +       /* Choose the first deeper one if no suitable shallower states found */
> +       for (i = idx + 1; i < drv->state_count; i++) {
> +               if (dev->states_usage[i].disable)
> +                       continue;
> +               if (drv->states[i].exit_latency_ns <= latency_req)
> +                       return i;
> +               /* all deeper states cannot meet latency requirement */
> +               break;
> +       }
> +
> +       /*
> +        * When comes here, there are two possibilities,
> +        * 1. all enabled state do not meet the latency requirement
> +        * 2. Only POLL meets the latency requirement but ignore_poll is set
> +        * in both cases, the first enabled state should be choosed
> +        */
> +       for (i = 0; i < drv->state_count; i++)
> +               if (!dev->states_usage[i].disable)
> +                       return i;
> +       return 0;
> +}
> +
> +/*
> + * Choose the first enabled deeper state that meets the latency requirement
> + */
> +static int get_deeper(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> +                     int idx, s64 latency_req)
> +{
> +       int i, shallowest = -1;
> +
> +       for (i = idx; i < drv->state_count; i++) {
> +               if (dev->states_usage[i].disable)
> +                       continue;
> +               if (shallowest == -1)
> +                       shallowest = i;
> +               if (drv->states[i].exit_latency_ns <= latency_req)
> +                       return i;
> +               /* No need to search for deeper state because latency_req cannot be met */
> +               break;
> +       }
> +
> +       /* Choose a shallower state if no deeper state found */
> +       for (i = idx - 1; i >= 0; i--) {
> +               if (dev->states_usage[i].disable)
> +                       continue;
> +               shallowest = i;
> +               if (drv->states[i].exit_latency_ns <= latency_req)
> +                       return i;
> +       }
> +
> +       /* Choose the shallowest enabled state if latency_req cannot be met */
> +       return shallowest == -1 ? 0 : shallowest;
> +}
> +
>  /**
>   * ladder_select_state - selects the next state to enter
>   * @drv: cpuidle driver
> @@ -69,59 +141,63 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>         struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
>         struct ladder_device_state *last_state;
>         int last_idx = dev->last_state_idx;
> -       int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0;
> +       int next_idx;
>         s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
>         s64 last_residency;
>
>         /* Special case when user has set very strict latency requirement */
>         if (unlikely(latency_req == 0)) {
> -               ladder_do_selection(dev, ldev, last_idx, 0);
> -               return 0;
> +               /* Choose the shallowest state */
> +               next_idx = get_deeper(drv, dev, 0, 0);
> +               goto end;
>         }
>
>         last_state = &ldev->states[last_idx];
>
>         last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns;
>
> -       /* consider promotion */
> -       if (last_idx < drv->state_count - 1 &&
> -           !dev->states_usage[last_idx + 1].disable &&
> -           last_residency > last_state->threshold.promotion_time_ns &&
> -           drv->states[last_idx + 1].exit_latency_ns <= latency_req) {
> +       /* meet latency requirement first */
> +       if (drv->states[last_idx].exit_latency_ns > latency_req) {
> +               /* Need to consider POLL because of latency requirement */
> +               next_idx = get_shallower(drv, dev, last_idx - 1, latency_req, 0);
> +               goto end;
> +       }
> +
> +       /* choose a deeper state because of promotion */
> +       if (last_residency > last_state->threshold.promotion_time_ns) {
> +               next_idx = get_deeper(drv, dev, last_idx + 1, latency_req);
> +
> +               /* no usable deeper state, use available deepest one */
> +               if (next_idx <= last_idx)
> +                       goto end;
>                 last_state->stats.promotion_count++;
>                 last_state->stats.demotion_count = 0;
> -               if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
> -                       ladder_do_selection(dev, ldev, last_idx, last_idx + 1);
> -                       return last_idx + 1;
> -               }
> +               if (last_state->stats.promotion_count >= last_state->threshold.promotion_count)
> +                       goto end;
> +               goto remain_cur;
>         }
>
> -       /* consider demotion */
> -       if (last_idx > first_idx &&
> -           (dev->states_usage[last_idx].disable ||
> -           drv->states[last_idx].exit_latency_ns > latency_req)) {
> -               int i;
> -
> -               for (i = last_idx - 1; i > first_idx; i--) {
> -                       if (drv->states[i].exit_latency_ns <= latency_req)
> -                               break;
> -               }
> -               ladder_do_selection(dev, ldev, last_idx, i);
> -               return i;
> -       }
> +       /* choose a shallower state because of demotion */
> +       if (last_residency < last_state->threshold.demotion_time_ns) {
> +               next_idx = get_shallower(drv, dev, last_idx - 1, latency_req, 1);
>
> -       if (last_idx > first_idx &&
> -           last_residency < last_state->threshold.demotion_time_ns) {
> +               /* no usable shallower state, use available shallowest one */
> +               if (next_idx >= last_idx)
> +                       goto end;
>                 last_state->stats.demotion_count++;
>                 last_state->stats.promotion_count = 0;
> -               if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) {
> -                       ladder_do_selection(dev, ldev, last_idx, last_idx - 1);
> -                       return last_idx - 1;
> -               }
> +               if (last_state->stats.demotion_count >= last_state->threshold.demotion_count)
> +                       goto end;
>         }
>
> -       /* otherwise remain at the current state */
> -       return last_idx;
> +remain_cur:
> +       /* remain at the current state but in case it is disabled */
> +       next_idx = get_shallower(drv, dev, last_idx, latency_req, 1);
> +
> +end:
> +       if (next_idx != last_idx)
> +               ladder_do_selection(dev, ldev, last_idx, next_idx);
> +       return next_idx;
>  }
>
>  /**
> @@ -152,8 +228,15 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>
>                 if (i < drv->state_count - 1)
>                         lstate->threshold.promotion_time_ns = state->exit_latency_ns;
> +               else
> +                       /* Effectively disable promotion from deepest state */
> +                       lstate->threshold.promotion_time_ns = S64_MAX;
> +
>                 if (i > first_idx)
>                         lstate->threshold.demotion_time_ns = state->exit_latency_ns;
> +               else
> +                       /* Effectively disable demotion from shallowest state */
> +                       lstate->threshold.demotion_time_ns = S64_MIN;
>         }
>
>         return 0;
> --
> 2.25.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ