linux-kernel - Re: [PATCH] Thermal: Fix lockup of cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1373999590.6458.34.camel@gandalf.local.home>
Date:	Tue, 16 Jul 2013 14:33:10 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Zhang Rui <rui.zhang@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] Thermal: Fix lockup of cpu_down()

On Tue, 2013-07-16 at 11:19 -0700, Srinivas Pandruvada wrote:
> Thanks. How did you trigger this error condition? Is it a code review or 
> you have some way to reproduce?

No, my tests do a cpu hotplug stress and the system would hang. I had to
bisect it to find the bug and it came to this code. What was weird is
that the module wasn't loaded. Then I ran the ftrace function tracer
stared by the kernel command line with the following:

 ftrace=function ftrace_filter=get_online_cpus,put_online_cpus

and after I booted up, I ran:

cat /debug/tracing/trace | perl -e '
my @stack;
while (<>) {
	if (/get_online/) {
		push @stack, $_;
	} elsif (/put_online/) {
		pop @stack;
	}
}
foreach my $line (@stack) {
	print $line;
}'

And it showed that get_online_cpus() was called twice without a matching
put_online_cpu(). The strange thing was the calls had no parent
function. Which is when I realized that the module was loaded but then
failed to init, and was unloaded. Which explains why it didn't show up
in my lsmod.

Then it was just the matter of looking at all the calls to
get_online_cpu() in the commit, and it was rather obvious to what the
bug was.

With the patch applied, the lockup went away.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/