linux-kernel - Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0ce0acbc-99da-925e-145d-3c80558be761@hisilicon.com>
Date: Mon, 4 Aug 2025 14:31:16 +0800
From: Jie Zhan <zhanjie9@...ilicon.com>
To: Beata Michalska <beata.michalska@....com>
CC: Bowen Yu <yubowen8@...wei.com>, <rafael@...nel.org>,
	<viresh.kumar@...aro.org>, <linux-pm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linuxarm@...wei.com>,
	<jonathan.cameron@...wei.com>, <lihuisong@...wei.com>,
	<zhenglifeng1@...wei.com>
Subject: Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in
 cppc_scale_freq_workfn()



On 31/07/2025 17:42, Beata Michalska wrote:
> On Thu, Jul 31, 2025 at 04:52:05PM +0800, Jie Zhan wrote:
>>
>>
>> On 31/07/2025 16:19, Beata Michalska wrote:
>>> Hi Bowen, Jie
>>> On Wed, Jul 30, 2025 at 11:23:12AM +0800, Bowen Yu wrote:
>>>> From: Jie Zhan <zhanjie9@...ilicon.com>
>>>>
>>>> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
>>>> it again next time and update the frequency scale when the cpu is active
>>>> and perf counters successfully return.
>>>>
>>>> Also, remove the FIE source on an actual failure.
>>>>
>>>> Signed-off-by: Jie Zhan <zhanjie9@...ilicon.com>
>>>> ---
>>>>  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
>>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>>>> index 904006027df2..e95844d3d366 100644
>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>> @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
>>>>  	struct cppc_cpudata *cpu_data;
>>>>  	unsigned long local_freq_scale;
>>>>  	u64 perf;
>>>> +	int ret;
>>>>  
>>>>  	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
>>>>  	cpu_data = cppc_fi->cpu_data;
>>>>  
>>>> -	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
>>>> +	ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
>>>> +	/*
>>>> +	 * Perf counters could be 0 if the cpu is in a low-power idle state.
>>>> +	 * Just try it again next time.
>>>> +	 */
>>>> +	if (ret == -EFAULT)
>>>> +		return;
>>> Which counters are we actually talking about here ?
>>
>> Delivered performance counter and reference performance counter.
>> They are actually AMU CPU_CYCLES and CNT_CYCLES event counters.
> That does track then.
>>
>>>> +
>>>> +	if (ret) {
>>>>  		pr_warn("%s: failed to read perf counters\n", __func__);
>>>> +		topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC,
>>>> +						 cpu_data->shared_cpu_map);
>>>>  		return;
>>>>  	}
>>> And the real error here would be ... ?
>>> That makes me wonder why this has been registered as the source of the freq
>>> scale in the first place if we are to hit some serious issue. Would you be able
>>> to give an example of any?
>> If it gets here, that would be -ENODEV or -EIO from cppc_get_perf_ctrs(),
>> which could possibly come from data corruption (no CPC descriptor) or a PCC
>> failure.
>>
>> I can't easily fake an error here, but the above -EFAULT path could
>> happen when it luckily passes the FIE init.
>>
> The change seems reasonable. Though I am wondering if some other errors might be
> rather transient as well ? Like -EIO ?
> Note, I'm not an expert here.
The -EIO from PCC contains much more error cases than this.