linux-kernel - Re: timer: lockup in run_timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130710095210.GD17211@twins.programming.kicks-ass.net>
Date:	Wed, 10 Jul 2013 11:52:10 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Sasha Levin <sasha.levin@...cle.com>
Cc:	Dave Jones <davej@...hat.com>, Tejun Heo <tj@...nel.org>,
	tglx@...utronix.de, LKML <linux-kernel@...r.kernel.org>,
	trinity@...r.kernel.org, rostedt@...dmis.org
Subject: Re: timer: lockup in run_timer_softirq()

On Tue, Jul 09, 2013 at 07:09:51PM -0400, Sasha Levin wrote:
> perf huh? I also have this spew I'm currently working on, it seems related:
> 
> [ 1443.380407] ------------[ cut here ]------------
> [ 1443.381713] WARNING: CPU: 20 PID: 49263 at kernel/lockdep.c:3552 check_flags+0x16b/0x220()
> [ 1443.383988] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
> [ 1443.385452] Modules linked in:
> [ 1443.386459] CPU: 20 PID: 49263 Comm: trinity-child50 Tainted: G        W
> 3.10.0-next-20130709-sasha #3953
> [ 1443.388735]  0000000000000de0 ffff880805e03ab8 ffffffff84191329 ffffffff8519f386
> [ 1443.390082]  ffff880805e03b08 ffff880805e03af8 ffffffff811279cc ffffffff8519f386
> [ 1443.390082]  0000000000000000 ffff8807cf898000 ffffffff85a66940 0000000000000082
> [ 1443.390082] Call Trace:
> [ 1443.390082]  <IRQ>  [<ffffffff84191329>] dump_stack+0x52/0x87
> [ 1443.390082]  [<ffffffff811279cc>] warn_slowpath_common+0x8c/0xc0
> [ 1443.390082]  [<ffffffff81127ab6>] warn_slowpath_fmt+0x46/0x50
> [ 1443.390082]  [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.390082]  [<ffffffff811a192b>] check_flags+0x16b/0x220
> [ 1443.390082]  [<ffffffff811a2ee7>] lock_is_held+0x77/0x110
> [ 1443.390082]  [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
> [ 1443.390082]  [<ffffffff8122dafe>] perf_tp_event+0xbe/0x450
> [ 1443.390082]  [<ffffffff8122de62>] ? perf_tp_event+0x422/0x450
> [ 1443.390082]  [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
> [ 1443.390082]  [<ffffffff812185f2>] perf_ftrace_function_call+0xc2/0xe0
> [ 1443.390082]  [<ffffffff811f8c78>] ? ftrace_ops_control_func+0xc8/0x140
> [ 1443.427495]  [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
> [ 1443.427495]  [<ffffffff81131917>] ? __local_bh_enable+0xc7/0xd0
> [ 1443.427495]  [<ffffffff811f8c78>] ftrace_ops_control_func+0xc8/0x140
> [ 1443.427495]  [<ffffffff841a1c3c>] ftrace_call+0x5/0x2f
> [ 1443.427495]  [<ffffffff841a1c3c>] ? ftrace_call+0x5/0x2f
> [ 1443.427495]  [<ffffffff81175b87>] ? vtime_account_irq_exit+0x67/0x80
> [ 1443.427495]  [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495]  [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495]  [<ffffffff8113185e>] ? __local_bh_enable+0xe/0xd0
> [ 1443.427495]  [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495]  [<ffffffff81131917>] __local_bh_enable+0xc7/0xd0
> [ 1443.427495]  [<ffffffff81132d07>] __do_softirq+0x447/0x4d0
> [ 1443.427495]  [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495]  [<ffffffff81132ed6>] irq_exit+0x86/0x120
> [ 1443.427495]  [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
> [ 1443.427495]  [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
> [ 1443.427495]  <EOI>  [<ffffffff81130542>] ? do_setitimer+0x242/0x2a0
> [ 1443.427495]  [<ffffffff8419871c>] ? _raw_spin_unlock_irq+0x4c/0x80
> [ 1443.427495]  [<ffffffff84198700>] ? _raw_spin_unlock_irq+0x30/0x80
> [ 1443.427495]  [<ffffffff81130542>] do_setitimer+0x242/0x2a0
> [ 1443.427495]  [<ffffffff811306fa>] alarm_setitimer+0x3a/0x70
> [ 1443.427495]  [<ffffffff8113b41e>] SyS_alarm+0xe/0x20
> [ 1443.427495]  [<ffffffff841a2240>] tracesys+0xdd/0xe2
> [ 1443.427495] ---[ end trace e3b9a6b9c7462a51 ]---

Fun.. :-) we trace __local_bh_enable() and hit a ftrace callback between
telling lockdep we enabled softirqs and actually doing so.

I'm just a tad confused by the trace; it says we go:
  lock_is_held()
    check_flags()

Looking at perf_tp_event() this would most likely be from:

  ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);

Where the lock_is_held() would be from rcu_dereference_check()'s
rcu_read_lock_sched_held(). However, by there we've already passed
rcu_read_lock() which includes rcu_lock_acquire() -> lock_acquire() ->
check_flags(). So it should've triggered there.

Ideally we'd not trace __local_bh_enable() at all, seeing as how any RCU usage
in there would be bound to trigger this.

Steven?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/