lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jTVZtyV2yeFNpGo4TnZY79CH_fpaSbVq1T9BJ0BohZsg@mail.gmail.com>
Date: Thu, 30 Oct 2025 19:11:03 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Ulf Hansson <ulf.hansson@...aro.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, linux-pm@...r.kernel.org, 
	Vincent Guittot <vincent.guittot@...aro.org>, Peter Zijlstra <peterz@...radead.org>, 
	Kevin Hilman <khilman@...libre.com>, Pavel Machek <pavel@...nel.org>, Len Brown <len.brown@...el.com>, 
	Daniel Lezcano <daniel.lezcano@...aro.org>, Saravana Kannan <saravanak@...gle.com>, 
	Maulik Shah <quic_mkshah@...cinc.com>, Prasad Sodagudi <psodagud@...cinc.com>, 
	Dhruva Gole <d-gole@...com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit
 during s2idle

On Thu, Oct 30, 2025 at 4:07 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
>
> On Thu, 30 Oct 2025 at 15:02, Rafael J. Wysocki <rafael@...nel.org> wrote:
> >
> > On Thu, Oct 30, 2025 at 1:32 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > >
> > > On Thu, 30 Oct 2025 at 13:23, Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > >
> > > > On Thu, Oct 30, 2025 at 1:00 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > > > >
> > > > > On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > > > > > >
> > > > > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > > > > avoid breaking this constraint when entering a low-power state during
> > > > > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > > > > suitable low-power state, by taking into account the QoS limit.
> > > > > > >
> > > > > > > Signed-off-by: Ulf Hansson <ulf.hansson@...aro.org>
> > > > > > > ---
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > >         - Limite the change to the genpd governor for CPUs.
> > > > > > >
> > > > > > > ---
> > > > > > >  drivers/pmdomain/core.c     | 10 ++++++++--
> > > > > > >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> > > > > > >  include/linux/pm_domain.h   |  1 +
> > > > > > >  3 files changed, 36 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > > > > > index 61c2277c9ce3..4fd546ef0448 100644
> > > > > > > --- a/drivers/pmdomain/core.c
> > > > > > > +++ b/drivers/pmdomain/core.c
> > > > > > > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> > > > > > >                         return;
> > > > > > >         }
> > > > > > >
> > > > > > > -       /* Choose the deepest state when suspending */
> > > > > > > -       genpd->state_idx = genpd->state_count - 1;
> > > > > > > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > > > > > > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > > > > > > +                       return;
> > > > > > > +       } else {
> > > > > > > +               /* Default to the deepest state. */
> > > > > > > +               genpd->state_idx = genpd->state_count - 1;
> > > > > > > +       }
> > > > > > > +
> > > > > > >         if (_genpd_power_off(genpd, false)) {
> > > > > > >                 genpd->states[genpd->state_idx].rejected++;
> > > > > > >                 return;
> > > > > > > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > > > > > > index 39359811a930..bd1b9d66d4a5 100644
> > > > > > > --- a/drivers/pmdomain/governor.c
> > > > > > > +++ b/drivers/pmdomain/governor.c
> > > > > > > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > > > > > >         return false;
> > > > > > >  }
> > > > > > >
> > > > > > > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > > > > > > +{
> > > > > > > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
> > > > > >
> > > > > > I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> > > > > > into account directly.
> > > > > >
> > > > > > It should be told by cpuidle which state has been selected on the CPU
> > > > > > side and it should not go any deeper than that anyway.
> > > > >
> > > > > For PSCI OS-initiated mode, cpuidle doesn't know about the states that
> > > > > may be shared among a group of CPUs.
> > > > >
> > > > > Instead, those states are controlled through the PM domain topology by
> > > > > genpd and its governor, hence this is needed too.
> > > >
> > > > All right, but I'd like to understand how all of that works.
> > > >
> > > > So cpuidle selects a state to enter for the given CPU and then genpd
> > > > is invoked.  It has to take the exit latency of that state into
> > > > account, so it doesn't go too deep.  How does it do that?
> > >
> > > Depending on the state selected, in cpuidle-psci.c we may end up
> > > calling __psci_enter_domain_idle_state() (only for the deepest
> > > CPU-state).
> > >
> > > For s2idle this means we call dev_pm_genpd_suspend|resume(), to manage
> > > the reference counting of the PM domains via genpd. This then may lead
> > > to that genpd_sync_power_off() tries to select a state by calling the
> > > new governor function above.
> > >
> > > Did that make sense?
> >
> > So IIUC this will only happen if the deepest idle state is selected in
> > which case the cpu_wakeup_latency_qos_limit() value is greater than
> > the exit latency of that state, but it may still need to be taken into
> > account when selecting the domain state.  However, this means that the
>
> Correct.
>
> > exit latency number for the deepest idle state is too low (it should
> > represent the worst-case exit latency which means the maximum domain
> > exit latency in this particular case).
>
> Yes, from the cpuidle state-selection point of view, but how is that a problem?

It is confusing.  Otherwise, for s2idle, I guess it is not a big deal.

I guess what happens is that genpd has a range of states with
different latency values to choose from and it is not practical to
expose all of them as CPU idle states, so you end up exposing just one
of them with the lowest latency value to allow cpuidle to involve
genpd often enough.

If that's the case, I'd make a note of that somewhere if I were you,
or people will routinely get confused by it.

> If the genpd-governor doesn't find a suitable "domain-idle-state", we
> fallback to using the one cpuidle selected.
>
> >
> > Moreover, it looks like the "runtime" cpuidle has the same problem, doesn't it?
>
> It works in a very similar way, but I fail to understand why you think
> there is a problem.

There is a problem because it may violate a "runtime" latency constraint.

Say you expose 2 CPU idle states, a shallow one and a genpd one.  The
advertised exit latency of the genpd state is X and the current
latency constraint is Y > X.  The genpd state is selected and genpd
doesn't look at the cpuidle_governor_latency_req() return value, so it
chooses a real state with exit latency Z > Y.

To a minimum, genpd should be made aware of
cpuidle_governor_latency_req(), but even then cpuidle governors take
exit latency into consideration in their computations, so things may
get confused somewhat.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ