[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130903201831.2ae872e4@gandalf.local.home>
Date: Tue, 3 Sep 2013 20:18:31 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: paulmck@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Jiri Olsa <jolsa@...hat.com>
Subject: Re: [RFC][PATCH 01/18 v2] ftrace: Add hash list to save RCU unsafe
functions
On Tue, 3 Sep 2013 19:57:05 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:
> I was also chasing down a nasty bug that looked to be a pointer
> corruption somewhere. Still never found exactly where it happened, but
> it always happened with the following conditions:
>
> synchronize_sched() was in progress
>
> The ftrace_unsafe_callback() was preempted by an interrupt
>
> lockdep was processing a lock
>
>
> A crash would happen which had memory corruption involved. But the
> above seemed always to be in play.
>
Now putting back the context level recursion checks, and also adding a
check for "oops_in_progress" to have the unsafe callback not call
rcu_read_lock() on oops, I can get decent backtraces.
This seems to be what I get consistently now:
[ 28.583983] Testing kprobe tracing: [ 28.609714] ------------[ cut here ]------------
[ 28.610050] WARNING: CPU: 2 PID: 0 at /home/rostedt/work/git/linux-trace.git/kernel/lockdep.c:960 __bfs+0xe8/0x1d5()
[ 28.610050] Modules linked in:
[ 28.610050] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.11.0-rc3-test+ #220
[ 28.610050] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[ 28.610050] 00000000000003c0 ffff88007d503a18 ffffffff81548e29 0000000000000000
[ 28.610050] 0000000000000000 ffff88007d503a58 ffffffff810427b1 ffffffff817fc1ba
[ 28.610050] ffffffff810944f6 0000000000000000 0000000000000002 ffffffff81093d06
[ 28.610050] Call Trace:
[ 28.610050] <IRQ> [<ffffffff81548e29>] dump_stack+0x52/0x89
[ 28.610050] [<ffffffff810427b1>] warn_slowpath_common+0x81/0x9b
[ 28.610050] [<ffffffff810944f6>] ? __bfs+0xe8/0x1d5
[ 28.610050] [<ffffffff81093d06>] ? noop_count+0xb/0xb
[ 28.610050] [<ffffffff810427e5>] warn_slowpath_null+0x1a/0x1c
[ 28.610050] [<ffffffff810944f6>] __bfs+0xe8/0x1d5
[ 28.610050] [<ffffffff81550c77>] ? sub_preempt_count+0xe/0xe0
[ 28.610050] [<ffffffff81095fa2>] ? print_irq_inversion_bug+0x1d2/0x1d2
[ 28.610050] [<ffffffff81095ff0>] check_usage_forwards+0x4e/0x87
[ 28.610050] [<ffffffff810965d0>] ? valid_state+0x2b/0x235
[ 28.610050] [<ffffffff810968d9>] mark_lock+0xff/0x1d8
[ 28.610050] [<ffffffff81096caa>] __lock_acquire+0x2f8/0xf57
[ 28.610050] [<ffffffff810df575>] ? ftrace_unsafe_callback+0x1f1/0x203
[ 28.610050] [<ffffffff810df465>] ? ftrace_unsafe_callback+0xe1/0x203
[ 28.610050] [<ffffffff81553dfc>] ? ftrace_call+0x5/0x2f
[ 28.610050] [<ffffffff810554dc>] ? __lock_task_sighand+0x9f/0xe9
[ 28.610050] [<ffffffff81097e70>] lock_acquire+0xf2/0x138
[ 28.610050] [<ffffffff810554dc>] ? __lock_task_sighand+0x9f/0xe9
[ 28.610050] [<ffffffff81053824>] ? task_pid_vnr+0xf/0xf
[ 28.610050] [<ffffffff8154cd40>] _raw_spin_lock+0x3b/0x4a
[ 28.610050] [<ffffffff810554dc>] ? __lock_task_sighand+0x9f/0xe9
[ 28.610050] [<ffffffff8105376a>] ? rcu_read_lock_held+0x37/0x3f
[ 28.610050] [<ffffffff81053824>] ? task_pid_vnr+0xf/0xf
[ 28.610050] [<ffffffff810554dc>] __lock_task_sighand+0x9f/0xe9
[ 28.610050] [<ffffffff81062e24>] ? __rcu_read_lock+0x4/0x1a
[ 28.610050] [<ffffffff81067f79>] run_posix_cpu_timers+0x141/0x4a8
[ 28.610050] [<ffffffff81062e24>] ? __rcu_read_lock+0x4/0x1a
[ 28.610050] [<ffffffff8107b97a>] ? trigger_load_balance+0x10/0x264
[ 28.610050] [<ffffffff81052aad>] update_process_times+0x6a/0x72
[ 28.610050] [<ffffffff810925b2>] ? tick_sched_handle.clone.9+0xc/0x55
[ 28.610050] [<ffffffff810925ec>] tick_sched_handle.clone.9+0x46/0x55
[ 28.610050] [<ffffffff810926d8>] tick_sched_timer+0x42/0x62
[ 28.610050] [<ffffffff81068b38>] __run_hrtimer+0xeb/0x1dc
[ 28.610050] [<ffffffff81092696>] ? tick_nohz_handler+0x9b/0x9b
[ 28.610050] [<ffffffff810693e4>] hrtimer_interrupt+0xed/0x1f3
[ 28.610050] [<ffffffff8102a160>] local_apic_timer_interrupt+0x57/0x5b
[ 28.610050] [<ffffffff8155629a>] smp_apic_timer_interrupt+0x2f/0x41
[ 28.610050] [<ffffffff81554e2f>] apic_timer_interrupt+0x6f/0x80
[ 28.610050] <EOI> [<ffffffff810df575>] ? ftrace_unsafe_callback+0x1f1/0x203
[ 28.610050] [<ffffffff810df465>] ? ftrace_unsafe_callback+0xe1/0x203
[ 28.610050] [<ffffffff8100b2df>] ? default_idle+0x21/0x32
[ 28.610050] [<ffffffff8100b2dd>] ? default_idle+0x1f/0x32
[ 28.610050] [<ffffffff8100ba5c>] arch_cpu_idle+0x18/0x22
[ 28.610050] [<ffffffff81089b59>] cpu_startup_entry+0x109/0x16c
[ 28.610050] [<ffffffff81028ba0>] start_secondary+0x251/0x258
[ 28.610050] ---[ end trace 19765ef14295ecea ]---
Oh yes, and it always happens on testing kprobes. But it happens when
that test does a synchronize_sched().
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists