linux-kernel - Re: [RFC/PATCH 1/3] PM: QoS: Introduce a system-wakeup QoS limit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFqG=bFSP2rJ3PXt5=6_nLdpJ+ir80krU1DrRCCMhwKQng@mail.gmail.com>
Date: Thu, 18 Sep 2025 17:33:36 +0200
From: Ulf Hansson <ulf.hansson@...aro.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Kevin Hilman <khilman@...libre.com>, linux-pm@...r.kernel.org, 
	Pavel Machek <pavel@...nel.org>, Len Brown <len.brown@...el.com>, 
	Daniel Lezcano <daniel.lezcano@...aro.org>, Saravana Kannan <saravanak@...gle.com>, 
	Maulik Shah <quic_mkshah@...cinc.com>, Prasad Sodagudi <psodagud@...cinc.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC/PATCH 1/3] PM: QoS: Introduce a system-wakeup QoS limit

On Wed, 17 Sept 2025 at 21:24, Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> Hi,
>
> Sorry for the delay.
>
> On Fri, Sep 12, 2025 at 3:58 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
> >
> > On Tue, 12 Aug 2025 at 11:26, Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > >
> > > On Mon, 11 Aug 2025 at 21:16, Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > >
> > > > On Mon, Aug 11, 2025 at 7:16 PM Kevin Hilman <khilman@...libre.com> wrote:
> > > > >
> > > > > "Rafael J. Wysocki" <rafael@...nel.org> writes:
> > > > >
> > > > > > On Wed, Jul 16, 2025 at 2:33 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > > > > >>
> > > > > >> Some platforms and devices supports multiple low-power-states than can be
> > > > > >> used for system-wide suspend. Today these states are selected on per
> > > > > >> subsystem basis and in most cases it's the deepest possible state that
> > > > > >> becomes selected.
> > > > > >>
> > > > > >> For some use-cases this is a problem as it isn't suitable or even breaks
> > > > > >> the system-wakeup latency constraint, when we decide to enter these deeper
> > > > > >> states during system-wide suspend.
> > > > > >>
> > > > > >> Therefore, let's introduce an interface for user-space, allowing us to
> > > > > >> specify the system-wakeup QoS limit. Subsequent changes will start taking
> > > > > >> into account the QoS limit.
> > > > > >
> > > > > > Well, this is not really a system-wakeup limit, but a CPU idle state
> > > > > > latency limit for states entered in the last step of suspend-to-idle.
> > > > > >
> > > > > > It looks like the problem is that the existing CPU latency QoS is not
> > > > > > taken into account by suspend-to-idle, so instead of adding an
> > > > > > entirely new interface to overcome this, would it make sense to add an
> > > > > > ioctl() to the existing one that would allow the user of it to
> > > > > > indicate that the given request should also be respected by
> > > > > > suspend-to-idle?
> > > > > >
> > > > > > There are two basic reasons why I think so:
> > > > > > (1) The requests that you want to be respected by suspend-to-idle
> > > > > > should also be respected by the regular "runtime" idle, or at least I
> > > > > > don't see a reason why it wouldn't be the case.
> > > > > > (2) The new interface introduced by this patch basically duplicates
> > > > > > the existing one.
> > > > >
> > > > > I also think that just using the existing /dev/cpu_dma_latency is the
> > > > > right approach here, and simply teaching s2idle to respect this value.
> > > > >
> > > > > I'm curious about the need for a new ioctl() though.  Under what
> > > > > conditions do you want normal/runtime CPUidle to respect this value and
> > > > > s2idle to not respect this value?
> > > >
> > > > In a typical PC environment s2idle is a replacement for ACPI S3 which
> > > > does not take any QoS constraints into account, so users may want to
> > > > set QoS limits for run-time and then suspend with the expectation that
> > > > QoS will not affect it.
> > >
> > > Yes, I agree. To me, these are orthogonal use-cases which could have
> > > different wakeup latency constraints.
> > >
> > > Adding an ioctl for /dev/cpu_dma_latency, as suggested by Rafael would
> > > allow this to be managed, I think.
> > >
> > > Although, I am not fully convinced yet that re-using
> > > /dev/cpu_dma_latency is the right path. The main reason is that I
> > > don't want us to limit the use-case to CPU latencies, but rather allow
> > > the QoS constraint to be system-wide for any type of device. For
> > > example, it could be used by storage drivers too (like NVMe, UFS,
> > > eMMC), as a way to understand what low power state to pick as system
> > > wide suspend. If you have a closer look at patch2 [1] , I suggest we
> > > extend the genpd-governor for *both* CPU-cluster-PM-domains and for
> > > other PM-domains too.
> > >
> > > Interested to hear your thoughts around this.
> >
> > Hey, just wanted to see if you have managed to digest this and have
> > any possible further comment?
>
> The reason why I thought about reusing /dev/cpu_dma_latency is because
> I think that the s2idle limit should also apply to cpuidle.  Of
> course, cpuidle may be limited further, but IMV it should observe the
> limit set on system suspend (it would be kind of inconsistent to allow
> cpuidle to use deeper idle states than can be used by s2idle).

Agreed!

>
> I also don't think that having a per-CPU s2idle limit would be
> particularly useful (and it might be problematic).
>
> Now, it is not as straightforward as I thought because someone may
> want to set a more restrictive limit on cpuidle, in which case they
> would need to open the same special device file twice etc and that
> would be quite cumbersome.
>
> So in the end I think that what you did in the $subject patch is
> better, but I still would like it to also affect cpuidle.

Okay. I will update the patches according to your suggestions!

>
> And it needs to be made clear that this is a limit on the resume
> latency of one device.  Worst case, the system wakeup latency may be a
> sum of those limits if the devices in question are resumed
> sequentially, so in fact this is a limit on the contribution of a
> given device to the system wakeup latency.

Indeed, that's a very good point! I will keep this in mind when
working on adding the documentation part.

Again, thanks a lot for your feedback!

Kind regards
Uffe