linux-kernel - Re: [tracing] WARNING: at kernel/lockdep.c:3545 check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1351003871.8467.18.camel@gandalf.local.home>
Date:	Tue, 23 Oct 2012 10:51:11 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Fengguang Wu <fengguang.wu@...el.com>
Cc:	Steven Rostedt <srostedt@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [tracing] WARNING: at kernel/lockdep.c:3545 check_flags()

On Thu, 2012-10-18 at 19:41 +0800, Fengguang Wu wrote:
> Hi Steven,

Hi Wu,

I was out of town and I don't read my redhat email address while
traveling.

> 
> This warning occur from time to time in my tests:
> 
> [  593.080133] Testing tracer function: 
> [  593.149895] ------------[ cut here ]------------
> [  593.149895] WARNING: at /c/kernel-tests/src/stable/kernel/lockdep.c:3545 check_flags+0x1c8/0x1d0()

This is:

        if (!hardirq_count()) {
                if (softirq_count()) {
                        /* like the above, but with softirqs */
                        DEBUG_LOCKS_WARN_ON(current->softirqs_enabled);
                } else {

Which means that some how softirqs are disabled but lockdep thinks they
are enabled.

> [  593.149895] Modules linked in:
> [  593.149895] Pid: 1, comm: swapper/0 Not tainted 3.7.0-rc1-12129-gee2dd3c #3
> [  593.149895] Call Trace:
> [  593.149895]  <IRQ>  [<ffffffff810734cf>] warn_slowpath_common+0x8f/0xc0
> [  593.149895]  [<ffffffff8107351a>] warn_slowpath_null+0x1a/0x20
> [  593.149895]  [<ffffffff810fac08>] check_flags+0x1c8/0x1d0
> [  593.149895]  [<ffffffff810fbfed>] lock_is_held+0x4d/0xa0
> [  593.149895]  [<ffffffff8156d570>] ? notify_die+0x30/0x30
> [  593.149895]  [<ffffffff81081b13>] ? __local_bh_enable+0x1d3/0x230
> [  593.149895]  [<ffffffff81133cdf>] rcu_read_lock_held+0x7f/0x90
> [  593.149895]  [<ffffffff81134208>] ftrace_ops_no_ops+0x188/0x190
> [  593.149895]  [<ffffffff81573872>] trace+0x55/0x93
> [  593.149895]  [<ffffffff81573872>] ? trace+0x55/0x93
> [  593.149895]  [<ffffffff81566fc9>] ? _raw_spin_unlock_irq+0x99/0xc0
> [  593.149895]  [<ffffffff8156d575>] ? sub_preempt_count+0x5/0x210
> [  593.149895]  [<ffffffff8156d575>] ? sub_preempt_count+0x5/0x210
> [  593.149895]  [<ffffffff81081b13>] ? __local_bh_enable+0x1d3/0x230

This makes sense, as function tracing of softirqs sometimes causes this
bug. It's a race where a function is traced inside enabling softirqs and
the tracer's lock causes a check of softirqs after lockdep disabled it
but before it was actually disabled. I'll try to figure out how this
happens again. It has been fixed in the past before.


> [  593.149895]  [<ffffffff810827ea>] __do_softirq+0x42a/0x710
> [  593.149895]  [<ffffffff81574c7c>] call_softirq+0x1c/0x30
> [  593.149895]  [<ffffffff81028a05>] do_softirq+0x1d5/0x2a0
> [  593.149895]  [<ffffffff81082d3d>] irq_exit+0xad/0xc0
> [  593.149895]  [<ffffffff81028035>] do_IRQ+0x75/0x100
> [  593.149895]  [<ffffffff81567c72>] common_interrupt+0x72/0x72
> [  593.149895]  <EOI>  [<ffffffff810fafbb>] ? mark_held_locks+0x9b/0x100
> [  593.149895]  [<ffffffff815675f3>] ? _raw_spin_unlock_irqrestore+0xd3/0x130
> [  593.149895]  [<ffffffff81307487>] __debug_object_init+0x1b7/0x580
> [  593.149895]  [<ffffffff813078c6>] debug_object_init_on_stack+0x36/0x40
> [  593.149895]  [<ffffffff8108a565>] init_timer_on_stack_key+0x35/0x80
> [  593.149895]  [<ffffffff815616fb>] schedule_timeout+0x8b/0x470
> [  593.149895]  [<ffffffff81573872>] ? trace+0x55/0x93
> [  593.149895]  [<ffffffff81562fb7>] ? __mutex_unlock_slowpath+0x247/0x380
> [  593.149895]  [<ffffffff81133487>] ? update_ftrace_function+0x17/0x150
> [  593.149895]  [<ffffffff81561afe>] schedule_timeout_uninterruptible+0x1e/0x20
> [  593.149895]  [<ffffffff8108ea60>] msleep+0x20/0x30
> [  593.149895]  [<ffffffff81145a6a>] trace_selftest_startup_function+0xda/0x2b0
> [  593.149895]  [<ffffffff81146c4c>] register_tracer+0x32c/0x430
> [  593.149895]  [<ffffffff81c3aee4>] ? init_trace_printk_function_export+0x48/0x48
> [  593.149895]  [<ffffffff81c3aef4>] init_function_trace+0x10/0x12
> [  593.149895]  [<ffffffff8100223a>] do_one_initcall+0x1aa/0x270
> [  593.149895]  [<ffffffff815313bc>] kernel_init+0x11c/0x370
> [  593.149895]  [<ffffffff81c17878>] ? do_early_param+0xc6/0xc6
> [  593.149895]  [<ffffffff810ca474>] ? schedule_tail+0x34/0x110
> [  593.149895]  [<ffffffff815312a0>] ? rest_init+0x170/0x170
> [  593.149895]  [<ffffffff815739ac>] ret_from_fork+0x7c/0xb0
> [  593.149895]  [<ffffffff815312a0>] ? rest_init+0x170/0x170
> 
> And it's bisected down to an unbelievable simple commit..
> 
> commit 17bb615ad4f8d2d2c0f02794d27d7f83e0009ef4
> Author: Steven Rostedt <srostedt@...hat.com>
> Date:   Mon May 23 15:27:46 2011 -0400
> 
>     tracing: Have event with function tracer check error return

This change is simply:

--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1657,7 +1657,12 @@ static struct ftrace_ops trace_ops __initdata  =
 
 static __init void event_trace_self_test_with_function(void)
 {
-       register_ftrace_function(&trace_ops);
+       int ret;
+       ret = register_ftrace_function(&trace_ops);
+       if (WARN_ON(ret < 0)) {
+               pr_info("Failed to enable function tracer for event tests\n")
+               return;
+       }


I highly doubt that this has anything to do with the bug. The bug is a
race, and that means you can't bisect it with single runs to determine
good and bad. It can only detect bad, but just because it passes,
doesn't actually mean its good.

My search is to see what is being traced again when it shouldn't be.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/