linux-kernel - Re: [RFC][PATCH 01/18 v2] ftrace: Add hash list to save RCU unsafe functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130903201831.2ae872e4@gandalf.local.home>
Date:	Tue, 3 Sep 2013 20:18:31 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	paulmck@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Jiri Olsa <jolsa@...hat.com>
Subject: Re: [RFC][PATCH 01/18 v2] ftrace: Add hash list to save RCU unsafe
 functions

On Tue, 3 Sep 2013 19:57:05 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:
 
> I was also chasing down a nasty bug that looked to be a pointer
> corruption somewhere. Still never found exactly where it happened, but
> it always happened with the following conditions:
> 
> synchronize_sched() was in progress
> 
> The ftrace_unsafe_callback() was preempted by an interrupt
> 
> lockdep was processing a lock
> 
> 
> A crash would happen which had memory corruption involved. But the
> above seemed always to be in play.
> 

Now putting back the context level recursion checks, and also adding a
check for "oops_in_progress" to have the unsafe callback not call
rcu_read_lock() on oops, I can get decent backtraces.

This seems to be what I get consistently now:

[   28.583983] Testing kprobe tracing: [   28.609714] ------------[ cut here ]------------
[   28.610050] WARNING: CPU: 2 PID: 0 at /home/rostedt/work/git/linux-trace.git/kernel/lockdep.c:960 __bfs+0xe8/0x1d5()

[   28.610050] Modules linked in:
[   28.610050] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.11.0-rc3-test+ #220
[   28.610050] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[   28.610050]  00000000000003c0 ffff88007d503a18 ffffffff81548e29 0000000000000000
[   28.610050]  0000000000000000 ffff88007d503a58 ffffffff810427b1 ffffffff817fc1ba
[   28.610050]  ffffffff810944f6 0000000000000000 0000000000000002 ffffffff81093d06
[   28.610050] Call Trace:
[   28.610050]  <IRQ>  [<ffffffff81548e29>] dump_stack+0x52/0x89
[   28.610050]  [<ffffffff810427b1>] warn_slowpath_common+0x81/0x9b
[   28.610050]  [<ffffffff810944f6>] ? __bfs+0xe8/0x1d5
[   28.610050]  [<ffffffff81093d06>] ? noop_count+0xb/0xb
[   28.610050]  [<ffffffff810427e5>] warn_slowpath_null+0x1a/0x1c
[   28.610050]  [<ffffffff810944f6>] __bfs+0xe8/0x1d5
[   28.610050]  [<ffffffff81550c77>] ? sub_preempt_count+0xe/0xe0
[   28.610050]  [<ffffffff81095fa2>] ? print_irq_inversion_bug+0x1d2/0x1d2
[   28.610050]  [<ffffffff81095ff0>] check_usage_forwards+0x4e/0x87
[   28.610050]  [<ffffffff810965d0>] ? valid_state+0x2b/0x235
[   28.610050]  [<ffffffff810968d9>] mark_lock+0xff/0x1d8
[   28.610050]  [<ffffffff81096caa>] __lock_acquire+0x2f8/0xf57
[   28.610050]  [<ffffffff810df575>] ? ftrace_unsafe_callback+0x1f1/0x203
[   28.610050]  [<ffffffff810df465>] ? ftrace_unsafe_callback+0xe1/0x203
[   28.610050]  [<ffffffff81553dfc>] ? ftrace_call+0x5/0x2f
[   28.610050]  [<ffffffff810554dc>] ? __lock_task_sighand+0x9f/0xe9
[   28.610050]  [<ffffffff81097e70>] lock_acquire+0xf2/0x138
[   28.610050]  [<ffffffff810554dc>] ? __lock_task_sighand+0x9f/0xe9
[   28.610050]  [<ffffffff81053824>] ? task_pid_vnr+0xf/0xf
[   28.610050]  [<ffffffff8154cd40>] _raw_spin_lock+0x3b/0x4a
[   28.610050]  [<ffffffff810554dc>] ? __lock_task_sighand+0x9f/0xe9
[   28.610050]  [<ffffffff8105376a>] ? rcu_read_lock_held+0x37/0x3f
[   28.610050]  [<ffffffff81053824>] ? task_pid_vnr+0xf/0xf
[   28.610050]  [<ffffffff810554dc>] __lock_task_sighand+0x9f/0xe9
[   28.610050]  [<ffffffff81062e24>] ? __rcu_read_lock+0x4/0x1a
[   28.610050]  [<ffffffff81067f79>] run_posix_cpu_timers+0x141/0x4a8
[   28.610050]  [<ffffffff81062e24>] ? __rcu_read_lock+0x4/0x1a
[   28.610050]  [<ffffffff8107b97a>] ? trigger_load_balance+0x10/0x264
[   28.610050]  [<ffffffff81052aad>] update_process_times+0x6a/0x72
[   28.610050]  [<ffffffff810925b2>] ? tick_sched_handle.clone.9+0xc/0x55
[   28.610050]  [<ffffffff810925ec>] tick_sched_handle.clone.9+0x46/0x55
[   28.610050]  [<ffffffff810926d8>] tick_sched_timer+0x42/0x62
[   28.610050]  [<ffffffff81068b38>] __run_hrtimer+0xeb/0x1dc
[   28.610050]  [<ffffffff81092696>] ? tick_nohz_handler+0x9b/0x9b
[   28.610050]  [<ffffffff810693e4>] hrtimer_interrupt+0xed/0x1f3
[   28.610050]  [<ffffffff8102a160>] local_apic_timer_interrupt+0x57/0x5b
[   28.610050]  [<ffffffff8155629a>] smp_apic_timer_interrupt+0x2f/0x41
[   28.610050]  [<ffffffff81554e2f>] apic_timer_interrupt+0x6f/0x80
[   28.610050]  <EOI>  [<ffffffff810df575>] ? ftrace_unsafe_callback+0x1f1/0x203
[   28.610050]  [<ffffffff810df465>] ? ftrace_unsafe_callback+0xe1/0x203
[   28.610050]  [<ffffffff8100b2df>] ? default_idle+0x21/0x32
[   28.610050]  [<ffffffff8100b2dd>] ? default_idle+0x1f/0x32
[   28.610050]  [<ffffffff8100ba5c>] arch_cpu_idle+0x18/0x22
[   28.610050]  [<ffffffff81089b59>] cpu_startup_entry+0x109/0x16c
[   28.610050]  [<ffffffff81028ba0>] start_secondary+0x251/0x258
[   28.610050] ---[ end trace 19765ef14295ecea ]---

Oh yes, and it always happens on testing kprobes. But it happens when
that test does a synchronize_sched().

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/