linux-kernel - Re: 2.6.29-rc1 does not boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090112185954.GB15494@elte.hu>
Date:	Mon, 12 Jan 2009 19:59:54 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Dieter Ries <clip2@....de>
Cc:	Maciej Rutecki <maciej.rutecki@...il.com>, travis@....com,
	rusty@...tcorp.com.au, linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: 2.6.29-rc1 does not boot


* Dieter Ries <clip2@....de> wrote:

> Hi,
> 
> Ingo Molnar schrieb:
> > 
> > So you did get stacktraces? You might want to boot with 
> > CONFIG_PROVE_LOCKING=y, that could also catch and report the lockup 
> > scenario.
> 
> This is what I have got:
> 
> [   12.340122] =============================================
> [   12.341044] [ INFO: possible recursive locking detected ]
> [   12.341044] 2.6.29-rc1-00041-gacd1e11 #143
> [   12.341044] ---------------------------------------------
> [   12.341044] events/0/9 is trying to acquire lock:
> [   12.341044]  (events){--..}, at: [<ffffffff80254783>]
> flush_work+0x33/0x100
> [   12.341044]
> 
> [   12.341044] but task is already holding lock:
> 
> [   12.341044]  (events){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
> 
> [   12.341044]
> 
> [   12.341044] other info that might help us debug this:
> 
> [   12.341044] 3 locks held by events/0/9:
> 
> [   12.341044]  #0:  (events){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
> 
> [   12.341044]  #1:  ((dbs_work).work){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
> 
> [   12.341044]  #2:  (dbs_mutex){--..}, at: [<ffffffff805692f5>]
> do_dbs_timer+0x25/0x250
> 
> [   12.341044]
> 
> [   12.341044] stack backtrace:
> 
> [   12.341044] Pid: 9, comm: events/0 Not tainted
> 2.6.29-rc1-00041-gacd1e11 #143
> [   12.341044] Call Trace:
> 
> [   12.341044]  [<ffffffff80269e69>] validate_chain+0xb69/0x1200
> 
> [   12.341044]  [<ffffffff8026a93e>] __lock_acquire+0x43e/0xa50
> 
> [   12.341044]  [<ffffffff8026afa8>] lock_acquire+0x58/0x80
> 
> [   12.341044]  [<ffffffff80254783>] ? flush_work+0x33/0x100
> 
> [   12.341044]  [<ffffffff802547a8>] flush_work+0x58/0x100
> 
> [   12.341044]  [<ffffffff80254783>] ? flush_work+0x33/0x100
> 
> [   12.341044]  [<ffffffff80268b7d>] ? trace_hardirqs_on+0xd/0x10
> 
> [   12.341044]  [<ffffffff80254a3c>] ? __queue_work+0x3c/0x50
> 
> [   12.341044]  [<ffffffff80254ad4>] ? queue_work_on+0x44/0x60
> 
> [   12.341044]  [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
> 
> [   12.341044]  [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
> 
> [   12.341044]  [<ffffffff80254ba3>] work_on_cpu+0x93/0xc0
> 
> [   12.341044]  [<ffffffff80253cc0>] ? do_work_for_cpu+0x0/0x20
> 
> [   12.341044]  [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
> 
> [   12.341044]  [<ffffffff8025cba1>] ?
> srcu_notifier_call_chain+0x11/0x20
> [   12.341044]  [<ffffffff8021c5c9>] acpi_cpufreq_target+0x239/0x350
> 
> [   12.341044]  [<ffffffff80268af2>] ?
> trace_hardirqs_on_caller+0x112/0x190
> [   12.341044]  [<ffffffff80699ac8>] ? mutex_lock_nested+0x228/0x2f0
> 
> [   12.341044]  [<ffffffff80565731>] __cpufreq_driver_target+0x81/0x90
> 
> [   12.341044]  [<ffffffff8056940e>] do_dbs_timer+0x13e/0x250
> 
> [   12.341044]  [<ffffffff805692d0>] ? do_dbs_timer+0x0/0x250
> 
> [   12.341044]  [<ffffffff80253f49>] run_workqueue+0x159/0x230
> 
> [   12.341044]  [<ffffffff80253ef7>] ? run_workqueue+0x107/0x230
> 
> [   12.341044]  [<ffffffff80254d1f>] worker_thread+0xbf/0x120
> 
> [   12.341044]  [<ffffffff80258550>] ? autoremove_wake_function+0x0/0x40
> 
> [   12.341044]  [<ffffffff80254c60>] ? worker_thread+0x0/0x120
> 
> [   12.341044]  [<ffffffff8025811d>] kthread+0x4d/0x80
> 
> [   12.341044]  [<ffffffff8020c87a>] child_rip+0xa/0x20
> 
> [   12.341044]  [<ffffffff8020c27c>] ? restore_args+0x0/0x30
> [   12.341044]  [<ffffffff802580d0>] ? kthread+0x0/0x80
> [   12.341044]  [<ffffffff8020c870>] ? child_rip+0x0/0x20
> 
> 
> complete log and config are attached.
> 
> The different git hash is because I reverted that revert patch. It is 
> exactly the same like the kernel on which I first found the problem, 
> only with some debugging enabled now. Maybe I should make more use of 
> gits capabilities...
> 
> Hope that helps,

thanks, it helps!

the problem isnt even the hotplug lock, but that work_on_cpu() uses the 
normal generic kevent workqueue - which workqueue can already contain 
items related to the cpufreq code hence it's not safe to call it with any 
cpufreq mutex held.

so Mike, i think we could solve this by work_on_cpu() getting its own 
workqueue?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/