[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090112185954.GB15494@elte.hu>
Date: Mon, 12 Jan 2009 19:59:54 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Dieter Ries <clip2@....de>
Cc: Maciej Rutecki <maciej.rutecki@...il.com>, travis@....com,
rusty@...tcorp.com.au, linux-kernel@...r.kernel.org,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: 2.6.29-rc1 does not boot
* Dieter Ries <clip2@....de> wrote:
> Hi,
>
> Ingo Molnar schrieb:
> >
> > So you did get stacktraces? You might want to boot with
> > CONFIG_PROVE_LOCKING=y, that could also catch and report the lockup
> > scenario.
>
> This is what I have got:
>
> [ 12.340122] =============================================
> [ 12.341044] [ INFO: possible recursive locking detected ]
> [ 12.341044] 2.6.29-rc1-00041-gacd1e11 #143
> [ 12.341044] ---------------------------------------------
> [ 12.341044] events/0/9 is trying to acquire lock:
> [ 12.341044] (events){--..}, at: [<ffffffff80254783>]
> flush_work+0x33/0x100
> [ 12.341044]
>
> [ 12.341044] but task is already holding lock:
>
> [ 12.341044] (events){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
>
> [ 12.341044]
>
> [ 12.341044] other info that might help us debug this:
>
> [ 12.341044] 3 locks held by events/0/9:
>
> [ 12.341044] #0: (events){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
>
> [ 12.341044] #1: ((dbs_work).work){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
>
> [ 12.341044] #2: (dbs_mutex){--..}, at: [<ffffffff805692f5>]
> do_dbs_timer+0x25/0x250
>
> [ 12.341044]
>
> [ 12.341044] stack backtrace:
>
> [ 12.341044] Pid: 9, comm: events/0 Not tainted
> 2.6.29-rc1-00041-gacd1e11 #143
> [ 12.341044] Call Trace:
>
> [ 12.341044] [<ffffffff80269e69>] validate_chain+0xb69/0x1200
>
> [ 12.341044] [<ffffffff8026a93e>] __lock_acquire+0x43e/0xa50
>
> [ 12.341044] [<ffffffff8026afa8>] lock_acquire+0x58/0x80
>
> [ 12.341044] [<ffffffff80254783>] ? flush_work+0x33/0x100
>
> [ 12.341044] [<ffffffff802547a8>] flush_work+0x58/0x100
>
> [ 12.341044] [<ffffffff80254783>] ? flush_work+0x33/0x100
>
> [ 12.341044] [<ffffffff80268b7d>] ? trace_hardirqs_on+0xd/0x10
>
> [ 12.341044] [<ffffffff80254a3c>] ? __queue_work+0x3c/0x50
>
> [ 12.341044] [<ffffffff80254ad4>] ? queue_work_on+0x44/0x60
>
> [ 12.341044] [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
>
> [ 12.341044] [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
>
> [ 12.341044] [<ffffffff80254ba3>] work_on_cpu+0x93/0xc0
>
> [ 12.341044] [<ffffffff80253cc0>] ? do_work_for_cpu+0x0/0x20
>
> [ 12.341044] [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
>
> [ 12.341044] [<ffffffff8025cba1>] ?
> srcu_notifier_call_chain+0x11/0x20
> [ 12.341044] [<ffffffff8021c5c9>] acpi_cpufreq_target+0x239/0x350
>
> [ 12.341044] [<ffffffff80268af2>] ?
> trace_hardirqs_on_caller+0x112/0x190
> [ 12.341044] [<ffffffff80699ac8>] ? mutex_lock_nested+0x228/0x2f0
>
> [ 12.341044] [<ffffffff80565731>] __cpufreq_driver_target+0x81/0x90
>
> [ 12.341044] [<ffffffff8056940e>] do_dbs_timer+0x13e/0x250
>
> [ 12.341044] [<ffffffff805692d0>] ? do_dbs_timer+0x0/0x250
>
> [ 12.341044] [<ffffffff80253f49>] run_workqueue+0x159/0x230
>
> [ 12.341044] [<ffffffff80253ef7>] ? run_workqueue+0x107/0x230
>
> [ 12.341044] [<ffffffff80254d1f>] worker_thread+0xbf/0x120
>
> [ 12.341044] [<ffffffff80258550>] ? autoremove_wake_function+0x0/0x40
>
> [ 12.341044] [<ffffffff80254c60>] ? worker_thread+0x0/0x120
>
> [ 12.341044] [<ffffffff8025811d>] kthread+0x4d/0x80
>
> [ 12.341044] [<ffffffff8020c87a>] child_rip+0xa/0x20
>
> [ 12.341044] [<ffffffff8020c27c>] ? restore_args+0x0/0x30
> [ 12.341044] [<ffffffff802580d0>] ? kthread+0x0/0x80
> [ 12.341044] [<ffffffff8020c870>] ? child_rip+0x0/0x20
>
>
> complete log and config are attached.
>
> The different git hash is because I reverted that revert patch. It is
> exactly the same like the kernel on which I first found the problem,
> only with some debugging enabled now. Maybe I should make more use of
> gits capabilities...
>
> Hope that helps,
thanks, it helps!
the problem isnt even the hotplug lock, but that work_on_cpu() uses the
normal generic kevent workqueue - which workqueue can already contain
items related to the cpufreq code hence it's not safe to call it with any
cpufreq mutex held.
so Mike, i think we could solve this by work_on_cpu() getting its own
workqueue?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists