lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 03 Mar 2015 15:09:48 +0000
From:	Kapileshwar Singh <kapileshwar.singh@....com>
To:	Viresh Kumar <viresh.kumar@...aro.org>
CC:	Javi Merino <Javi.Merino@....com>,
	Eduardo Valentin <edubezval@...il.com>,
	Zhang Rui <rui.zhang@...el.com>,
	Linux PM list <linux-pm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Punit Agrawal <Punit.Agrawal@....com>,
	Lina Iyer <lina.iyer@...aro.org>,
	Mark Brown <broonie@...nel.org>, Jon Medhurst <tixy@...aro.org>
Subject: Re: [PATCH v3 5/5] thermal: cpu_cooling: update the cpu device when
 cpufreq updates the policy cpu

On 03/03/15 13:07, Viresh Kumar wrote:
> On 3 March 2015 at 17:11, Kapileshwar Singh <kapileshwar.singh@....com> wrote:
>> Yes I indeed tested the case where we cache the device pointer of the CPU for which the OPP's are populated.
>> When this CPU is hotplugged out, it invalidates the device pointer itself. Here are the error we get in dmesg:
> 
> What do you mean by 'invalidates the device pointer' ? that cpu_dev is NULL ?

The cpu_dev is not NULL but we get an erroneous OPP back. We found the problem lies in the way we calculate the frequency for the cluster.

>> <3>[67203.216774] opp_get_voltage: Invalid parameters
>> <3>[67203.326774] opp_get_voltage: Invalid parameters
>> <3>[67203.326774] opp_get_voltage: Invalid parameters
> 
> Have you handwritten them ? Why don't they precede with dev_pm_* ??

I have not handwritten them, It was from a Linaro 3.10 based kernel when I first noticed this issue but the same problem exists in mainline. 

Apologies for this I sent you an older trace which I had saved when I found the bug. Here is the trace I get from mainline

[ 5680.135339] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.245528] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.355432] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.465521] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.575599] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.685817] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.795556] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.905598] dev_pm_opp_get_voltage: Invalid parameters

> 
>>
>> Which happens because:
>>
>> unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
>> {
>> ..
>>         tmp_opp = rcu_dereference(opp);
>>         if (unlikely(IS_ERR_OR_NULL(tmp_opp)) || !tmp_opp->available)
>>                 pr_err("%s: Invalid parameters\n", __func__);
> 
> This %s should print routine name ..
> 
>>         else
>> ..
>>
>> Which happens when
>>
>>         opp = dev_pm_opp_find_freq_exact(cpufreq_device->cpu_dev, freq_hz,
>>                                          true);
>>
>> returns a an erroneous or NULL OPP or the opp is unavailable (in the above condition)
> 

Update: This returns an erroneous  OPP

> Please goto the depth of this thing, as I don't think it should happen.
> 
> Over that I was asking you if you have tested the solution Javi gave,
> because OPPs
> wouldn't have been initialized for other CPUs once policy->cpu goes down.
I did test this but we were working with the assumption that OPPs should be populated for all the CPUs and also that OPPs are lost for a hotplugged CPU which I see is not the case. 

We have looked at this more closely and found that problem lies in:

	freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));

which returns a NULL frequency as we are not checking for online CPUs here. We shall come up with a fix for this. Many thanks for helping us with the investigation.

Regards, 
KP

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists