linux-kernel - Re: perf: 3.17 another perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1410151418570.3151@vincent-weaver-1.umelst.maine.edu>
Date:	Wed, 15 Oct 2014 14:34:10 -0400 (EDT)
From:	Vince Weaver <vincent.weaver@...ne.edu>
To:	Vince Weaver <vincent.weaver@...ne.edu>
cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Paul Mackerras <paulus@...ba.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: perf: 3.17 another perf_fuzzer lockup

OK, so it turns out that the oops I saw with memory corruption wasn't the 
bug I was tracking, but something that comes up sometimes when trying to 
run ftrace at the same time as fuzzing.  So we'll leave that for another 
day.

The 3.17+ lockup I am tracking still reproduces as of git from yesterday 
(even after the 3.18-rc perf_event merges).

I can use sysrq to get the stack trace, the one CPU is stuck in a call
to find_get_context().

An example backtrace:

[88200.300003]  <EOI>
[88200.300003]  [<ffffffff81114869>] ? ____cache_alloc+0x130/0x25b
[88200.300003]  [<ffffffff8107fb05>] ? __call_rcu.constprop.63+0x1bf/0x1cb
[88200.300003]  [<ffffffff8107fb2b>] kfree_call_rcu+0x1a/0x1c
[88200.300003]  [<ffffffff810cf84f>] put_ctx+0x51/0x55
[88200.300003]  [<ffffffff810d1840>] find_get_context+0x166/0x195
[88200.300003]  [<ffffffff810d5856>] SYSC_perf_event_open+0x47b/0x7f5
[88200.300003]  [<ffffffff810d5f55>] SyS_perf_event_open+0xe/0x10
[88200.300003]  [<ffffffff815362d6>] system_call_fastpath+0x16/0x1b

It looks like the
			else if (task->perf_event_ctxp[ctxn])
	                        err = -EAGAIN;

case is triggering non-stop in the retry path of 
find_get_context() and so the kernel gets stuck forever retrying.

I can drop some printks in if it will help debug.  I've tried running 
ftrace, but for whatever reason if I enable ftrace the bug won't trigger.

Vince

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/