[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1195632-d973-4339-a89d-e1e62b98015d@oss.qualcomm.com>
Date: Fri, 24 Oct 2025 16:25:37 +0800
From: Zhongqiu Han <zhongqiu.han@....qualcomm.com>
To: Lukasz Luba <lukasz.luba@....com>, Zhongqiu Han <quic_zhonhan@...cinc.com>
Cc: linux-pm@...r.kernel.org, lenb@...nel.org, christian.loehle@....com,
amit.kucheria@...aro.org, ulf.hansson@...aro.org, james.morse@....com,
Dave.Martin@....com, reinette.chatre@...el.com, tony.luck@...el.com,
pavel@...nel.org, linux-kernel@...r.kernel.org, rafael@...nel.org,
zhongqiu.han@....qualcomm.com
Subject: Re: [PATCH v2 0/5] PM QoS: Add CPU affinity latency QoS support and
resctrl integration
On 10/23/2025 9:09 PM, Lukasz Luba wrote:
> Hi Zhongqui,
>
> My apologies for being a bit late with my comments...
>
> On 7/21/25 13:40, Zhongqiu Han wrote:
>> Hi all,
>>
>> This patch series introduces support for CPU affinity-based latency
>> constraints in the PM QoS framework. The motivation is to allow
>> finer-grained power management by enabling latency QoS requests to target
>> specific CPUs, rather than applying system-wide constraints.
>>
>> The current PM QoS framework supports global and per-device CPU latency
>> constraints. However, in many real-world scenarios, such as IRQ affinity
>> or CPU-bound kernel threads, only a subset of CPUs are
>> performance-critical. Applying global constraints in such cases
>> unnecessarily prevents other CPUs from entering deeper C-states, leading
>> to increased power consumption.
>>
>> This series addresses that limitation by introducing a new interface that
>> allows latency constraints to be applied to a CPU mask. This is
>> particularly useful on heterogeneous platforms (e.g., big.LITTLE) and
>> embedded systems where power efficiency is critical for example:
>>
>> driver A rt kthread B module C
>> CPU IDs (mask): 0-3 2-5 6-7
>> target latency(us): 20 30 100
>> | | |
>> v v v
>> +---------------------------------+
>> | PM QoS Framework |
>> +---------------------------------+
>> | | |
>> v v v
>> CPU IDs (mask): 0-3 2-3,4-5 6-7
>> runtime latency(us): 20 20, 30 100
>>
>> The current implementation includes only cpu_affinity_latency_qos_add()
>> and cpu_affinity_latency_qos_remove() interfaces. An update interface is
>> planned for future submission, along with PM QoS optimizations in the UFS
>> subsystem.
>>
>> Patch1 introduces the core support for CPU affinity latency QoS in the PM
>> QoS framework.
>>
>> Patch2 removes redundant KERN_ERR prefixes in WARN() calls in the global
>> CPU PM QoS interface. This change addresses issues in existing code
>> and is
>> not related to the new interface introduced in this patch series.
>>
>> Patch3 adds documentation for the new interface.
>>
>> Patch4 fixes a minor documentation issue related to the return type of
>> cpu_latency_qos_request_active(). This change addresses issues in
>> existing
>> doc and is not related to the new interface introduced in this patch
>> series.
>>
>> Patch5 updates the resctrl pseudo-locking logic to use the new CPU
>> affinity latency QoS helpers, improving clarity and consistency. The only
>> functional and beneficial change is that the new interface actively wakes
>> up CPUs whose latency QoS values have changed, ensuring the latency limit
>> takes effect immediately.
>
> Could you describe a bit more the big picture of this proposed design,
> please?
>
> Ideally with some graph of connected frameworks & drivers and how they
> are going to work together.
Hi Lukasz,
Thank you very much for your review and discussion~
I will describe you one big picture if needed, please allow me
illustrate a simple scenario using pseudo code first:
Suppose there is a USB driver. This driver uses the kernel existing
cpu_latency_qos_* interfaces to boost its IRQ execution efficiency. Its
IRQ affinity is set to core0 and core1 according to DTS config, and the
affinity of its threaded IRQ (bottom half) is also set to CPU0 and CPU1.
=================================================================
Using the kernel existing cpu_latency_qos_* interfaces:
=================================================================
static int dwc3_sample_probe(struct platform_device *pdev)
{
cpu_latency_qos_add_request(&foo->pm_qos_req,DEFAULT_VALUE);
xxxx;
ret = devm_request_threaded_irq(xxx,xxx,foo_dwc3_pwr_irq, ....)
xxxx;
}
static irqreturn_t foo_dwc3_pwr_irq(int irq, void *dev)
{
xxxx;
cpu_latency_qos_update_request(&foo->pm_qos_req, 0);
/*.... process interrupt ....*/
cpu_latency_qos_update_request(&foo->pm_qos_req, DEFAULT_VALUE);
return IRQ_HANDLED;
}
The number of IRQ executions on each CPU:
==================================================================
IRQ HWIRQ affinity CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
320 0xb0 0x3 9782468 415472 0 0 0 0 0 0
==================================================================
==================================================================
Process: irq/320-dwc3, [affinity: 0x3] cpu:1 pid:5250 ppid:2
==================================================================
From the code, we can see that the USB module using the kernel existing
cpu_latency_qos_* interfaces sets the CPU latency to 0,. which prevents
all CPUs from entering idle states—even C1. During. operation, the USB
IRQs is triggered 9,782,468 times on CPU0, and each time it runs,. all
CPUs are blocked from entering deeper C-states. However, only CPU0,. CPU1
are actually involved in handling the IRQ and its threaded bottom half.
It will cause unnecessary power consumption on other CPUs.
(Please note, due to the simplicity of the pseudocode, I did not show
how the IRQ bottom-half thread is synchronized to restrict CPU idle
states via PM QoS. In reality, it's clear that we can also apply a CPU
latency limit to the bottom-half thread.)
If we use current patch series API cpu_affinity_latency_qos_xxx, such
as:
=================================================================
Using current patch series cpu_affinity_latency_qos_* interfaces:
=================================================================
static int dwc3_sample_probe(struct platform_device *pdev)
{
cpu_affinity_latency_qos_add(&foo->pm_qos_req,DEFAULT_VALUE, mask);
xxxx;
ret = devm_request_threaded_irq(xxx,xxx,foo_dwc3_pwr_irq, ....)
xxxx;
}
We can only constrain the CPU latency PM QoS on CPU0 and CPU1 in order
to save power.
>
> E.g.:
> 1. what are the other components in the kernel which would use this
> feature?
1.Drivers such as Audio, USB, and UFS, which currently rely on the
kernel's global CPU Latency PM QoS interface, but only require latency
constraints on a subset of CPUs, can leverage this new interface to
achieve improved power efficiency.
2.I’m considering supporting this feature in userspace.
Once implemented, userspace threads—such as mobile gaming threads that
aim to constrain CPU latency and are already bound to big cores—will be
able to use the API to help save power.
> 2. is there also a user-space interface planned for it so a HAL in
> the middle-ware would configure these "short-wake-up-CPU"?
Yes, I am considering to support userspace on patch V3.
> 3. Is it possible to view/debug from the user-space which component
> requested this setting for some subsets of cpus?
I'm uncertain whether we should provide the ability to inspect
which components are applying constraints on CPU latency. However,
what I do want to ensure is that—similar to the existing /dev
cpu_dma_latency interface in the current kernel—I can offer per-CPU
level latency value setting and querying.
>
> Regards,
> Lukasz
>
>
--
Thx and BRs,
Zhongqiu Han
Powered by blists - more mailing lists