[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1373999590.6458.34.camel@gandalf.local.home>
Date: Tue, 16 Jul 2013 14:33:10 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Zhang Rui <rui.zhang@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] Thermal: Fix lockup of cpu_down()
On Tue, 2013-07-16 at 11:19 -0700, Srinivas Pandruvada wrote:
> Thanks. How did you trigger this error condition? Is it a code review or
> you have some way to reproduce?
No, my tests do a cpu hotplug stress and the system would hang. I had to
bisect it to find the bug and it came to this code. What was weird is
that the module wasn't loaded. Then I ran the ftrace function tracer
stared by the kernel command line with the following:
ftrace=function ftrace_filter=get_online_cpus,put_online_cpus
and after I booted up, I ran:
cat /debug/tracing/trace | perl -e '
my @stack;
while (<>) {
if (/get_online/) {
push @stack, $_;
} elsif (/put_online/) {
pop @stack;
}
}
foreach my $line (@stack) {
print $line;
}'
And it showed that get_online_cpus() was called twice without a matching
put_online_cpu(). The strange thing was the calls had no parent
function. Which is when I realized that the module was loaded but then
failed to init, and was unloaded. Which explains why it didn't show up
in my lsmod.
Then it was just the matter of looking at all the calls to
get_online_cpu() in the commit, and it was rather obvious to what the
bug was.
With the patch applied, the lockup went away.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists