[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101021140031.GC2920@redhat.com>
Date: Thu, 21 Oct 2010 10:00:31 -0400
From: Jason Baron <jbaron@...hat.com>
To: Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
Cc: Ingo Molnar <mingo@...e.hu>, Steven Rostedt <rostedt@...dmis.org>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
2nddept-manager@....hitachi.co.jp
Subject: Re: [PATCH][GIT PULL] tracing: Fix compile issue for
trace_sched_wakeup.c
On Thu, Oct 21, 2010 at 11:58:56AM +0900, Masami Hiramatsu wrote:
> (2010/10/21 1:43), Jason Baron wrote:
> > On Wed, Oct 20, 2010 at 05:40:45PM +0200, Ingo Molnar wrote:
> >> FYI, there's a new mystery hang (sometimes crash) that triggers in -tip - and which
> >> seems to be tracing related. See the crashlog below - config attached.
> >>
> >> It's not bisectable - small changes in the kernel make the bug come/go. (might be a
> >> race of some sorts)
> >>
> >> Thanks,
> >>
> >> Ingo
> >>
> >
> > strange b/c it looks like we get though enabling/disabling the
> > tracepoitns individually, but then when we go to enable all the
> > tracepoints we hit this hang - perhaps, suggesting a race. Do we always
> > fail after "Testing all events:" is printed? Does the crash have any
> > more clues. I will try and re-produce this.
> >
> > Also, I noticed some recent changes to text_poke_smp() usage of
> > stop_machine() on Oct. 14th. That's related to the area where this appears
> > to hang, so if things were working with this .config before then, that
> > might be a place to look. Adding Masami to the 'cc list.
>
> Recent changes of text_poke_smp() just removed unnecessary
> get/put_online_cpu(), so I think it's not related this bug.
>
> It seems there can be a bug in stop_machine() routine under
> heavy use. usually that is called just once at a time, but jump
> label and optprobe might call it heavily (thousands times?).
> So some racy situation can be happen easily.
>
for most tracepoints there is 1 text location that needs to be
updated...however, I know that for kmalloc, you can end up with
hundredds or even thousands. So yes, we can end up calling
stop_machine() thousands of times.
There is a patch to reduce kmalloc tracepoint text locations by moving
them out of line: http://lkml.org/lkml/2010/10/13/208
Also, text_poke_smp_batch() would allow us to update all these text
locations at once.
Nonetheless, there appears to be a underlying race condition...
thanks,
-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists