linux-kernel - Re: cpufreq_stats NULL deref on second system suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <523076A0.3020308@linux.vnet.ibm.com>
Date:	Wed, 11 Sep 2013 19:26:48 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Viresh Kumar <viresh.kumar@...aro.org>
CC:	"Rafael J. Wysocki" <rjw@...k.pl>,
	Stephen Warren <swarren@...dotorg.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	cpufreq <cpufreq@...r.kernel.org>
Subject: Re: cpufreq_stats NULL deref on second system suspend

On 09/11/2013 05:29 PM, Viresh Kumar wrote:
> On 11 September 2013 16:44, Srivatsa S. Bhat
> <srivatsa.bhat@...ux.vnet.ibm.com> wrote:
>> Hmm? The problem is not about merely updating the policy->cpu field; the
>> main issue is that the existing code was not letting the cpufreq-stats
>> code know that we updated the policy->cpu under the hood. It is important
>> for cpufreq-stats to know this because it maintains the reference to its
>> stats structure by associating it with the policy->cpu. So if policy->cpu
>> changes under the hood, it loses track of its reference. So we need to
>> keep that code informed about changes to policy->cpu. Thus, we need to call
>> update_policy_cpu() in the CPU online path (during resume). I don't see
>> how we can skip that.
> 
> Okay.. There are two different ways in which cpufreq_add_dev() work
> currently..
> 
> Boot cluster (i.e. policy with boot CPU)
> ---------------
> 
> Here cpufreq_remove_dev() is never called for boot cpu but all others.
> And similarly cpufreq_add_dev() is never called for boot cpu but all others.
> 
> Now policy->cpu contains meaningful cpu at beginning of resume and
> we don't need to modify that at all.. For all the remaining CPUs we
> better call cpufreq_add_policy_cpu() rather..
>

Yes, and the existing code already does that. And if you observe the placement
of the call to update_policy_cpu() in my patch, you'll find that it won't
be called in cases where we call cpufreq_add_policy_cpu(). Which is exactly
what you want, IIUC.

> Non-boot Cluster
> ---------------------
> 
> All CPUs here are removed and at the end policy->cpu contains the last
> cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3..
> 
> Not at resume we will add cpu2 first and so need to update policy->cpu
> to 2.. But for all other CPUs in this cluster we return early from
> cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus
> was fixed by call to ->init() for the first cpu of this cluster..
> 
> And so we never reach the line: policy->cpu = cpu;
> 
> For the first cpu of non-boot cluster we need to call update_policy_cpu()
> and not for others..
> 

Yes, and that's precisely why I added the call to update_policy_cpu() only
near the statement 'policy->cpu = cpu'. All other cases don't need to call
that function, and in my patch, they indeed don't call that function.

A simple way to look at this is:
If policy->cpu is updated anywhere in the resume (cpu online) path, we need
to call update_policy_cpu(). Otherwise, we don't need to call that function.
This will cover both the scenarios you described above.

> 
> But for the boot cluster if we can call ->init() somehow at resume time,
> then things would be fairly similar in both cases..
> 
> I am running of time now, as need to leave office now...
> I hope I made the problem more clear or probably the way I see it :)
> 

Well, your explanation matches my understanding of the scenarios, and
I had written the patch keeping those scenarios in mind; so I believe the
patch is correct as it is. So what I didn't get is this: are you suggesting
that something is wrong in my patch, or are you just reiterating the different
scenarios involved so that I can recheck whether my patch is right?

If it is the former, please point me to the exact problem you found in
my patch. If it is the latter, I'm pretty sure my patch is right, I've
already checked it :-)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/