linux-kernel - irq_work: WARNING: at kernel/irq_work.c:98 irq_work_needs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <51397B96.7030008@oracle.com>
Date:	Fri, 08 Mar 2013 00:48:06 -0500
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Steven Rostedt <rostedt@...dmis.org>,
	Frederic Weisbecker <fweisbec@...il.com>
CC:	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...e.hu>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	paul.gortmaker@...driver.com, Dave Jones <davej@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: irq_work: WARNING: at kernel/irq_work.c:98 irq_work_needs_cpu+0x8a/0xb0()

Hi guys,

While fuzzing with trinity inside a KVM tools guest it seemed to have hit the
new warning in kernel/irq_work.c:

[  486.527075] ------------[ cut here ]------------
[  486.527788] WARNING: at kernel/irq_work.c:98 irq_work_needs_cpu+0x8a/0xb0()
[  486.528870] Modules linked in:
[  486.529377] Pid: 0, comm: swapper/2 Tainted: G        W    3.9.0-rc1-next-20130307-sasha-00047-g0a7d304-dirty #1037
[  486.530165] Call Trace:
[  486.530165]  [<ffffffff8110774c>] warn_slowpath_common+0x8c/0xc0
[  486.530165]  [<ffffffff81107795>] warn_slowpath_null+0x15/0x20
[  486.530165]  [<ffffffff811eadda>] irq_work_needs_cpu+0x8a/0xb0
[  486.530165]  [<ffffffff81174585>] tick_nohz_stop_sched_tick+0x95/0x2a0
[  486.530165]  [<ffffffff811749d9>] __tick_nohz_idle_enter+0x189/0x1b0
[  486.530165]  [<ffffffff81174ae1>] tick_nohz_idle_enter+0xa1/0xd0
[  486.530165]  [<ffffffff8106e4f7>] cpu_idle+0x77/0x180
[  486.530165]  [<ffffffff83d2e8ae>] ? setup_APIC_timer+0xc9/0xce
[  486.530165]  [<ffffffff83d2dea9>] start_secondary+0xe1/0xe8
[  486.530165] ---[ end trace dd075f5cfc2c4f26 ]---

Obviously this was happening when trinity tried to exercise the shutdown syscall.

It was followed by RCU choking and causing a bunch of locked tasks, preventing
shutdown. I guess it's the result of whatever caused this warning above to
happen, but in-case it isn't, the relevant parts of the RCU hang are:

[  607.040283] INFO: task init:1 blocked for more than 120 seconds.
[  607.042932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  607.047046] init            D ffff8800ba308000  2736     1      0 0x00000000
[  607.050498]  ffff8800ba311b18 0000000000000002 0000000000000003 ffff8800ba308000
[  607.055110]  ffff8800ba310000 ffff8800ba310010 ffff8800ba311fd8 ffff8800ba310000
[  607.058208]  ffff8800ba310010 ffff8800ba311fd8 ffffffff8542c420 ffff8800ba308000
[  607.060611] Call Trace:
[  607.060847]  [<ffffffff83d38c85>] ? __mutex_lock_common+0x365/0x5d0
[  607.061462]  [<ffffffff83d3ad85>] schedule+0x55/0x60
[  607.061948]  [<ffffffff83d3b143>] schedule_preempt_disabled+0x13/0x20
[  607.062590]  [<ffffffff83d38cc5>] __mutex_lock_common+0x3a5/0x5d0
[  607.063209]  [<ffffffff811c2262>] ? rcu_cleanup_dead_cpu+0x52/0x250
[  607.063840]  [<ffffffff819e73b9>] ? free_cpumask_var+0x9/0x10
[  607.064453]  [<ffffffff811c2262>] ? rcu_cleanup_dead_cpu+0x52/0x250
[  607.065035]  [<ffffffff83d39020>] mutex_lock_nested+0x40/0x50
[  607.065606]  [<ffffffff811c2262>] rcu_cleanup_dead_cpu+0x52/0x250
[  607.066230]  [<ffffffff8117c2dd>] ? trace_hardirqs_on+0xd/0x10
[  607.066810]  [<ffffffff83d30f5e>] rcu_cpu_notify+0x1b4/0x1ef
[  607.067375]  [<ffffffff83d415ae>] notifier_call_chain+0xee/0x130
[  607.067975]  [<ffffffff8113cec9>] __raw_notifier_call_chain+0x9/0x10
[  607.068631]  [<ffffffff8110b5ab>] __cpu_notify+0x1b/0x30
[  607.069165]  [<ffffffff8110b5d0>] cpu_notify_nofail+0x10/0x30
[  607.069749]  [<ffffffff83d22605>] _cpu_down+0x185/0x2e0
[  607.070319]  [<ffffffff8110bd38>] disable_nonboot_cpus+0x88/0x1b0
[  607.070937]  [<ffffffff81125766>] kernel_restart+0x16/0x60
[  607.071487]  [<ffffffff8112594c>] SYSC_reboot+0x18c/0x2a0
[  607.072020]  [<ffffffff811c17e3>] ? rcu_cleanup_after_idle+0x23/0xf0
[  607.072635]  [<ffffffff811c1914>] ? rcu_eqs_exit_common+0x64/0x280
[  607.073251]  [<ffffffff811fd815>] ? user_exit+0xc5/0x100
[  607.073772]  [<ffffffff8117c2dd>] ? trace_hardirqs_on+0xd/0x10
[  607.074352]  [<ffffffff810725f3>] ? syscall_trace_enter+0x23/0x290
[  607.075054]  [<ffffffff81125a69>] SyS_reboot+0x9/0x10
[  607.075495]  [<ffffffff83d45c10>] tracesys+0xdd/0xe2
[  607.075967] 4 locks held by init/1:
[  607.076439]  #0:  (reboot_mutex){+.+.+.}, at: [<ffffffff811258a6>] SYSC_reboot+0xe6/0x2a0
[  607.077276]  #1:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8110b602>] cpu_maps_update_begin+0x12/0x20
[  607.078288]  #2:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8110b6f7>] cpu_hotplug_begin+0x27/0x60
[  607.079260]  #3:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff811c2262>] rcu_cleanup_dead_cpu+0x52/0x250

[  607.187177] rcu_preempt     D ffff8800aa8884a8  5136    11      2 0x00000000
[  607.187890]  ffff8800ba391c08 0000000000000002 ffff8800ba391bb8 000000078117e00a
[  607.188674]  ffff8800ba390000 ffff8800ba390010 ffff8800ba391fd8 ffff8800ba390000
[  607.189472]  ffff8800ba390010 ffff8800ba391fd8 ffff8800ba308000 ffff8800ba388000
[  607.190581] Call Trace:
[  607.190849]  [<ffffffff83d3ad85>] schedule+0x55/0x60
[  607.191336]  [<ffffffff83d381c6>] schedule_timeout+0x276/0x2c0
[  607.191904]  [<ffffffff81119e40>] ? lock_timer_base+0x70/0x70
[  607.192460]  [<ffffffff83d38229>] schedule_timeout_uninterruptible+0x19/0x20
[  607.193132]  [<ffffffff811c61d8>] rcu_gp_init+0x438/0x490
[  607.193646]  [<ffffffff8117c2dd>] ? trace_hardirqs_on+0xd/0x10
[  607.194216]  [<ffffffff811c62ec>] rcu_gp_kthread+0xbc/0x2d0
[  607.194760]  [<ffffffff811c6230>] ? rcu_gp_init+0x490/0x490
[  607.195298]  [<ffffffff81136570>] ? wake_up_bit+0x40/0x40
[  607.195823]  [<ffffffff811c6230>] ? rcu_gp_init+0x490/0x490
[  607.196364]  [<ffffffff81135b72>] kthread+0xe2/0xf0
[  607.196842]  [<ffffffff8117e00a>] ? __lock_release+0x1da/0x1f0
[  607.197405]  [<ffffffff81135a90>] ? __init_kthread_worker+0x70/0x70
[  607.198022]  [<ffffffff83d4593c>] ret_from_fork+0x7c/0xb0
[  607.198559]  [<ffffffff81135a90>] ? __init_kthread_worker+0x70/0x70

[  609.414891] Showing all locks held in the system:
[  609.415490] 4 locks held by init/1:
[  609.415836]  #0:  (reboot_mutex){+.+.+.}, at: [<ffffffff811258a6>] SYSC_reboot+0xe6/0x2a0
[  609.416708]  #1:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8110b602>] cpu_maps_update_begin+0x12/0x20
[  609.417712]  #2:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8110b6f7>] cpu_hotplug_begin+0x27/0x60
[  609.418668]  #3:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff811c2262>] rcu_cleanup_dead_cpu+0x52/0x250
[  609.419819] 1 lock held by rcu_preempt/11:
[  609.420277]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff811c5f09>] rcu_gp_init+0x169/0x490
[  609.421391] 2 locks held by khungtaskd/3087:
[  609.421811]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff811b0cec>] check_hung_uninterruptible_tasks+0x3c/0x390
[  609.422867]  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff811791ac>] debug_show_all_locks+0x5c/0x270
[  609.423841] 2 locks held by sh/7108:
[  609.424199]  #0:  (&tty->ldisc_sem){.+.+.+}, at: [<ffffffff81c1b03f>] tty_ldisc_ref_wait+0x1f/0x50
[  609.425134]  #1:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff81c1772e>] n_tty_read+0x31e/0x980

It looks like rcu_gp_init() went sleeping with the onoff_mutex held and
never got it's wakeup, while the rcu_cleanup_dead_cpu() code is waiting
to grab on that mutex.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/