linux-kernel - Re: [RFC/PATCH 1/3] PM: QoS: Introduce a system-wakeup QoS limit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hC=sEcC-mU8jArwVN3EA6+U=EmCa2e7TKO0sg6LJiz7g@mail.gmail.com>
Date: Wed, 17 Sep 2025 21:24:45 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Ulf Hansson <ulf.hansson@...aro.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Kevin Hilman <khilman@...libre.com>, linux-pm@...r.kernel.org, 
	Pavel Machek <pavel@...nel.org>, Len Brown <len.brown@...el.com>, 
	Daniel Lezcano <daniel.lezcano@...aro.org>, Saravana Kannan <saravanak@...gle.com>, 
	Maulik Shah <quic_mkshah@...cinc.com>, Prasad Sodagudi <psodagud@...cinc.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC/PATCH 1/3] PM: QoS: Introduce a system-wakeup QoS limit

Hi,

Sorry for the delay.

On Fri, Sep 12, 2025 at 3:58 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
>
> On Tue, 12 Aug 2025 at 11:26, Ulf Hansson <ulf.hansson@...aro.org> wrote:
> >
> > On Mon, 11 Aug 2025 at 21:16, Rafael J. Wysocki <rafael@...nel.org> wrote:
> > >
> > > On Mon, Aug 11, 2025 at 7:16 PM Kevin Hilman <khilman@...libre.com> wrote:
> > > >
> > > > "Rafael J. Wysocki" <rafael@...nel.org> writes:
> > > >
> > > > > On Wed, Jul 16, 2025 at 2:33 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > > > >>
> > > > >> Some platforms and devices supports multiple low-power-states than can be
> > > > >> used for system-wide suspend. Today these states are selected on per
> > > > >> subsystem basis and in most cases it's the deepest possible state that
> > > > >> becomes selected.
> > > > >>
> > > > >> For some use-cases this is a problem as it isn't suitable or even breaks
> > > > >> the system-wakeup latency constraint, when we decide to enter these deeper
> > > > >> states during system-wide suspend.
> > > > >>
> > > > >> Therefore, let's introduce an interface for user-space, allowing us to
> > > > >> specify the system-wakeup QoS limit. Subsequent changes will start taking
> > > > >> into account the QoS limit.
> > > > >
> > > > > Well, this is not really a system-wakeup limit, but a CPU idle state
> > > > > latency limit for states entered in the last step of suspend-to-idle.
> > > > >
> > > > > It looks like the problem is that the existing CPU latency QoS is not
> > > > > taken into account by suspend-to-idle, so instead of adding an
> > > > > entirely new interface to overcome this, would it make sense to add an
> > > > > ioctl() to the existing one that would allow the user of it to
> > > > > indicate that the given request should also be respected by
> > > > > suspend-to-idle?
> > > > >
> > > > > There are two basic reasons why I think so:
> > > > > (1) The requests that you want to be respected by suspend-to-idle
> > > > > should also be respected by the regular "runtime" idle, or at least I
> > > > > don't see a reason why it wouldn't be the case.
> > > > > (2) The new interface introduced by this patch basically duplicates
> > > > > the existing one.
> > > >
> > > > I also think that just using the existing /dev/cpu_dma_latency is the
> > > > right approach here, and simply teaching s2idle to respect this value.
> > > >
> > > > I'm curious about the need for a new ioctl() though.  Under what
> > > > conditions do you want normal/runtime CPUidle to respect this value and
> > > > s2idle to not respect this value?
> > >
> > > In a typical PC environment s2idle is a replacement for ACPI S3 which
> > > does not take any QoS constraints into account, so users may want to
> > > set QoS limits for run-time and then suspend with the expectation that
> > > QoS will not affect it.
> >
> > Yes, I agree. To me, these are orthogonal use-cases which could have
> > different wakeup latency constraints.
> >
> > Adding an ioctl for /dev/cpu_dma_latency, as suggested by Rafael would
> > allow this to be managed, I think.
> >
> > Although, I am not fully convinced yet that re-using
> > /dev/cpu_dma_latency is the right path. The main reason is that I
> > don't want us to limit the use-case to CPU latencies, but rather allow
> > the QoS constraint to be system-wide for any type of device. For
> > example, it could be used by storage drivers too (like NVMe, UFS,
> > eMMC), as a way to understand what low power state to pick as system
> > wide suspend. If you have a closer look at patch2 [1] , I suggest we
> > extend the genpd-governor for *both* CPU-cluster-PM-domains and for
> > other PM-domains too.
> >
> > Interested to hear your thoughts around this.
>
> Hey, just wanted to see if you have managed to digest this and have
> any possible further comment?

The reason why I thought about reusing /dev/cpu_dma_latency is because
I think that the s2idle limit should also apply to cpuidle.  Of
course, cpuidle may be limited further, but IMV it should observe the
limit set on system suspend (it would be kind of inconsistent to allow
cpuidle to use deeper idle states than can be used by s2idle).

I also don't think that having a per-CPU s2idle limit would be
particularly useful (and it might be problematic).

Now, it is not as straightforward as I thought because someone may
want to set a more restrictive limit on cpuidle, in which case they
would need to open the same special device file twice etc and that
would be quite cumbersome.

So in the end I think that what you did in the $subject patch is
better, but I still would like it to also affect cpuidle.

And it needs to be made clear that this is a limit on the resume
latency of one device.  Worst case, the system wakeup latency may be a
sum of those limits if the devices in question are resumed
sequentially, so in fact this is a limit on the contribution of a
given device to the system wakeup latency.