[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5230B991.3040702@linux.vnet.ibm.com>
Date: Thu, 12 Sep 2013 00:12:25 +0530
From: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To: Stephen Warren <swarren@...dotorg.org>
CC: "Rafael J. Wysocki" <rjw@...k.pl>,
Viresh Kumar <viresh.kumar@...aro.org>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
cpufreq <cpufreq@...r.kernel.org>
Subject: Re: cpufreq_stats NULL deref on second system suspend
On 09/11/2013 11:33 PM, Srivatsa S. Bhat wrote:
> On 09/11/2013 09:35 PM, Stephen Warren wrote:
>> On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote:
>>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
>>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>>>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
>>>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>>>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>>>>>>> Viresh,
>>>>>>>>>>>
>>>>>>>>>>> I'm seeing the crash below when suspending my system for the second time.
>> ...
>>> Stephen, I went through the code and I think I found out what is going wrong.
>>> Can you please try the following patch?
>>
>> Unfortunately, I still see the exact same failure/backtrace with this
>> patch applied.
>>
>
> Oh, is it? Can you please give me the map of the related cpus on your
> system? (ie., cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus for
> each CPU.)
>
> I must be missing something...
>
OK, I took a second look at the code, and I suspect that applying the
second patch might help. So can you try by applying both the patches
please[1][2]?
Basically here is my hunch: say CPUs 2 and 3 are part of a policy and
3 is the policy->cpu. During suspend, CPU 2 will be taken offline first,
and we hit this code:
1199 if (cpu != policy->cpu && !frozen) {
1200 sysfs_remove_link(&dev->kobj, "cpufreq");
1201 } else if (cpus > 1) {
1202
1203 new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen);
1204 if (new_cpu >= 0) {
1205 WARN_ON(lock_policy_rwsem_write(cpu));
1206 update_policy_cpu(policy, new_cpu);
1207 unlock_policy_rwsem_write(cpu);
1208
1209 if (!frozen) {
1210 pr_debug("%s: policy Kobject moved to cpu: %d "
1211 "from: %d\n",__func__, new_cpu, cpu);
1212 }
1213 }
1214 }
At this point, the first 'if' condition fails because frozen == true.
So it enters the else part. But, policy->cpu is actually 3, not 2,
and hence we invoke nominate_...() unnecessarily. That function returns
3 since that's the only CPU remaining in the mask, and so we call
update_policy_cpu() with new_cpu = 3, and old_cpu was also 3! And that
is the perfect recipe for disaster, with the current implementation of
update_policy_cpu().
And my second patch [2] tried to fix this exact problem, although I
didn't realize we actually had a case where we hit this in the current
code itself.
So please try by applying both the patches and let me know how it goes.
Thanks a lot for your testing efforts!
[1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2
[2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2
Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists