linux-kernel - Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251030164542.atnhs4wgk6ggmmly@lcpd911>
Date: Thu, 30 Oct 2025 22:15:42 +0530
From: Dhruva Gole <d-gole@...com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
CC: Ulf Hansson <ulf.hansson@...aro.org>, <linux-pm@...r.kernel.org>, "Vincent
 Guittot" <vincent.guittot@...aro.org>, Peter Zijlstra <peterz@...radead.org>,
	Kevin Hilman <khilman@...libre.com>, Pavel Machek <pavel@...nel.org>, "Len
 Brown" <len.brown@...el.com>, Daniel Lezcano <daniel.lezcano@...aro.org>,
	Saravana Kannan <saravanak@...gle.com>, Maulik Shah
	<quic_mkshah@...cinc.com>, Prasad Sodagudi <psodagud@...cinc.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit

On Oct 29, 2025 at 15:28:22 +0100, Rafael J. Wysocki wrote:
> On Wed, Oct 29, 2025 at 9:10 AM Dhruva Gole <d-gole@...com> wrote:
> >
> > Hi Ulf,
> >
> > On Oct 16, 2025 at 17:19:21 +0200, Ulf Hansson wrote:
> > > Some platforms supports multiple low-power states for CPUs that can be used
> > > when entering system-wide suspend. Currently we are always selecting the
> > > deepest possible state for the CPUs, which can break the system-wakeup
> > > latency constraint that may be required for some use-cases.
> > >
> > > Let's take the first step towards addressing this problem, by introducing
> > > an interface for user-space, that allows us to specify the CPU
> > > system-wakeup QoS limit. Subsequent changes will start taking into account
> > > the new QoS limit.
> > >
> > > Signed-off-by: Ulf Hansson <ulf.hansson@...aro.org>
> > > ---
> > >
> > > Changes in v2:
> > >       - Renamings to reflect the QoS are limited to CPUs.
> > >       - Move code inside "CONFIG_CPU_IDLE".
> > >
> > > ---
> > >  include/linux/pm_qos.h |   5 ++
> > >  kernel/power/qos.c     | 102 +++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 107 insertions(+)
> > >
> > > diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> > > index 4a69d4af3ff8..bf7524d38933 100644
> > > --- a/include/linux/pm_qos.h
> > > +++ b/include/linux/pm_qos.h
> > > @@ -149,6 +149,7 @@ bool cpu_latency_qos_request_active(struct pm_qos_request *req);
> > >  void cpu_latency_qos_add_request(struct pm_qos_request *req, s32 value);
> > >  void cpu_latency_qos_update_request(struct pm_qos_request *req, s32 new_value);
> > >  void cpu_latency_qos_remove_request(struct pm_qos_request *req);
> > > +s32 cpu_wakeup_latency_qos_limit(void);
> > >  #else
> > >  static inline s32 cpu_latency_qos_limit(void) { return INT_MAX; }
> > >  static inline bool cpu_latency_qos_request_active(struct pm_qos_request *req)
> > > @@ -160,6 +161,10 @@ static inline void cpu_latency_qos_add_request(struct pm_qos_request *req,
> > >  static inline void cpu_latency_qos_update_request(struct pm_qos_request *req,
> > >                                                 s32 new_value) {}
> > >  static inline void cpu_latency_qos_remove_request(struct pm_qos_request *req) {}
> > > +static inline s32 cpu_wakeup_latency_qos_limit(void)
> > > +{
> > > +     return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> > > +}
> > >  #endif
> > >
> > >  #ifdef CONFIG_PM
> > > diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> > > index 4244b069442e..8c024d7dc43e 100644
> > > --- a/kernel/power/qos.c
> > > +++ b/kernel/power/qos.c
> > > @@ -415,6 +415,103 @@ static struct miscdevice cpu_latency_qos_miscdev = {
> > >       .fops = &cpu_latency_qos_fops,
> > >  };
> > >
> > > +/* The CPU system wakeup latency QoS. */
> > > +static struct pm_qos_constraints cpu_wakeup_latency_constraints = {
> > > +     .list = PLIST_HEAD_INIT(cpu_wakeup_latency_constraints.list),
> > > +     .target_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > > +     .default_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > > +     .no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > > +     .type = PM_QOS_MIN,
> > > +};
> > > +
> > > +/**
> > > + * cpu_wakeup_latency_qos_limit - Current CPU system wakeup latency QoS limit.
> > > + *
> > > + * Returns the current CPU system wakeup latency QoS limit that may have been
> > > + * requested by user-space.
> > > + */
> > > +s32 cpu_wakeup_latency_qos_limit(void)
> > > +{
> > > +     return pm_qos_read_value(&cpu_wakeup_latency_constraints);
> > > +}
> > > +
> > > +static int cpu_wakeup_latency_qos_open(struct inode *inode, struct file *filp)
> > > +{
> > > +     struct pm_qos_request *req;
> > > +
> > > +     req = kzalloc(sizeof(*req), GFP_KERNEL);
> > > +     if (!req)
> > > +             return -ENOMEM;
> > > +
> > > +     req->qos = &cpu_wakeup_latency_constraints;
> > > +     pm_qos_update_target(req->qos, &req->node, PM_QOS_ADD_REQ,
> > > +                          PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> > > +     filp->private_data = req;
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static int cpu_wakeup_latency_qos_release(struct inode *inode,
> > > +                                       struct file *filp)
> > > +{
> > > +     struct pm_qos_request *req = filp->private_data;
> > > +
> > > +     filp->private_data = NULL;
> > > +     pm_qos_update_target(req->qos, &req->node, PM_QOS_REMOVE_REQ,
> > > +                          PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> >
> > Please excuse the delay in reviewing these patches,
> > I was wondering why we have decided here in release to reset the
> > constraints set by a user. For eg. even when I was testing the previous
> > revision locally I'd just commented out this release hook, since I
> > wanted to be able to just echo 0xABCD into /dev/cpu_wakeup_latency...
> 
> If you want "fire and forget", that would be a different interface.
> Device special files are not for that.
> 
> Cleaning up after closing a file descriptor is a safety measure and
> CPU wakeup latency constraints are a big deal.  Leaving leftover ones
> behind dead processes is not a good idea.

Hmm okay ..

> 
> > It seems an overkill to me that a userspace program be required to hold
> > open this file just to make sure the constraints are honoured for the
> > lifetime of the device. We should definitely give the freedom to just be
> > able to echo and also be able to cat and read back from the same place
> > about the latency constraint being set.
> 
> So you'd want a sysfs attribute here, but that has its own issues (the
> last writer "wins", so if there are multiple users of it with
> different needs in user space, things get tricky).

sysfs makes sense, then would it make sense to have something like a
/sys/devices/system/cpu/cpu0/power/cpu_wakeup_latency entry?

IMHO userspace should decide accordingly to manage it's users and how/whom to allow to
set the latency constraint.
We already have CPU latency QoS entry for example which is sysfs too.

> 
> > One other thing on my mind is - and probably unrelated to this specific
> > series, but I think we must have some sysfs entry either appear in
> > /sys/.../cpu0/cpuidle or s2idle/ where we can show next feesible s2idle
> > state that the governor has chosen to enter based on the value set in
> > cpu_wakeup_latency.
> 
> Exit latency values for all states are exposed via sysfs.  Since
> s2idle always uses the deepest state it can use, it is quite
> straightforward to figure out which of them will be used going
> forward, given a specific latency constraint.

I disagree regarding the straightforward part. There could be
multiple domain heirarchy in a system for eg. and all these multiple
domains would have their own set of domain-idle-states. All of them having their own
entry, exit, and residency latencies. I myself while testing this series
have been thoroughly confused at times what idle-state did the kernel
actually pick this time, and had to add prints just to figure that out.

When implementing these things for the first
time, especially when one has complex and many a domain idle-states it
would indeed help alot if the kernel could just advertise somewhere what
the governor is going to pick as the next s2idle state.

Also, I am not quite sure if these latencies are exposed in the
domain-idle-states scenario ... 
I tried checking in /sys/kernel/debug/pm_genpd/XXX/ but I only see
these:
active_time  current_state  devices  idle_states  sub_domains  total_idle_time

Maybe an additional s2idle_state or something appearing here is what I
was inclined toward.


-- 
Best regards,
Dhruva Gole
Texas Instruments Incorporated