linux-kernel - Re: [PATCH][GIT PULL] tracing: Fix compile issue for trace_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101021140031.GC2920@redhat.com>
Date:	Thu, 21 Oct 2010 10:00:31 -0400
From:	Jason Baron <jbaron@...hat.com>
To:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Steven Rostedt <rostedt@...dmis.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	2nddept-manager@....hitachi.co.jp
Subject: Re: [PATCH][GIT PULL] tracing: Fix compile issue for
	trace_sched_wakeup.c

On Thu, Oct 21, 2010 at 11:58:56AM +0900, Masami Hiramatsu wrote:
> (2010/10/21 1:43), Jason Baron wrote:
> > On Wed, Oct 20, 2010 at 05:40:45PM +0200, Ingo Molnar wrote:
> >> FYI, there's a new mystery hang (sometimes crash) that triggers in -tip - and which 
> >> seems to be tracing related. See the crashlog below - config attached.
> >>
> >> It's not bisectable - small changes in the kernel make the bug come/go. (might be a 
> >> race of some sorts)
> >>
> >> Thanks,
> >>
> >> 	Ingo
> >>
> > 
> > strange b/c it looks like we get though enabling/disabling the
> > tracepoitns individually, but then when we go to enable all the
> > tracepoints we hit this hang - perhaps, suggesting a race. Do we always
> > fail after "Testing all events:" is printed? Does the crash have any
> > more clues. I will try and re-produce this.
> > 
> > Also, I noticed some recent changes to text_poke_smp() usage of
> > stop_machine() on Oct. 14th. That's related to the area where this appears
> > to hang, so if things were working with this .config before then, that
> > might be a place to look. Adding Masami to the 'cc list.
> 
> Recent changes of text_poke_smp() just removed unnecessary
> get/put_online_cpu(), so I think it's not related this bug.
> 
> It seems there can be a bug in stop_machine() routine under
> heavy use. usually that is called just once at a time, but jump
> label and optprobe might call it heavily (thousands times?).
> So some racy situation can be happen easily.
> 

for most tracepoints there is 1 text location that needs to be
updated...however, I know that for kmalloc, you can end up with
hundredds or even thousands. So yes, we can end up calling
stop_machine() thousands of times.

There is a patch to reduce kmalloc tracepoint text locations by moving
them out of line: http://lkml.org/lkml/2010/10/13/208

Also, text_poke_smp_batch() would allow us to update all these text
locations at once.

Nonetheless, there appears to be a underlying race condition...

thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/