[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061016224330.GB3746@localhost.localdomain>
Date: Mon, 16 Oct 2006 15:43:30 -0700
From: Ravikiran G Thirumalai <kiran@...lex86.org>
To: Andrew Morton <akpm@...l.org>
Cc: linux-kernel@...r.kernel.org,
Christoph Lameter <clameter@...r.sgi.com>,
Alok Kataria <alok.kataria@...softinc.com>,
"Shai Fultheim (Shai@...lex86.org)" <shai@...lex86.org>,
"Benzi Galili (Benzi@...leMP.com)" <benzi@...lemp.com>
Subject: 2.6.19-rc2 cpu hotplug lockdep warning: possible circular locking dependency
(Was: [patch] slab: Fix a cpu hotplug race condition while tuning slab cpu
caches)
On Mon, Oct 16, 2006 at 12:48:08PM -0700, Andrew Morton wrote:
> >
> > Not when I tested it. I just retested with lockdep on and things seemed
> > fine on a SMP.
>
> Great, thanks. Please ensure that lockdep is used when testing the kernel.
> Also preempt, DEBUG_SLAB, DEBUG_SPINLOCK_SLEEP and various other things.
> I guess the list in Documentation/SubmitChecklist is the definitive one.
Seems like cpu offline spits out lockdep warnings when actually offlining a
CPU with CPU_HOTPLUG + CONFIG_PREEMPT + CONFIG_SLAB_DEBUG etc. Here is the
trace I get. Note that this is plain 2.6.19-rc2 (_without_ the slab cpu
hotplug fix).
This a dual CPU 4 thread Xeon SMP box with 8G main memory. I am also
attaching the .config I used, and the full dmesg.
Thanks,
Kiran
[ 235.667515] CPU 3 is now offline
[ 235.670860]
[ 235.670861] =======================================================
[ 235.678694] [ INFO: possible circular locking dependency detected ]
[ 235.684990] 2.6.19-rc2 #1
[ 235.687649] -------------------------------------------------------
[ 235.693945] bash/3223 is trying to acquire lock:
[ 235.698596] ((cpu_chain).rwsem){----}, at: [<ffffffff8023ea62>] blocking_notifier_call_chain+0x13/0x36
[ 235.708226]
[ 235.708226] but task is already holding lock:
[ 235.714149] (workqueue_mutex){--..}, at: [<ffffffff802422a1>] workqueue_cpu_callback+0x14e/0x2a4
[ 235.723260]
[ 235.723261] which lock already depends on the new lock.
[ 235.723262]
[ 235.731575]
[ 235.731575] the existing dependency chain (in reverse order) is:
[ 235.739144]
[ 235.739144] -> #1 (workqueue_mutex){--..}:
[ 235.744954] [<ffffffff80249cc9>] add_lock_to_list+0x78/0x9d
[ 235.751588] [<ffffffff8024bcd8>] __lock_acquire+0xaca/0xc37
[ 235.758222] [<ffffffff8024c0ff>] lock_acquire+0x5c/0x77
[ 235.764501] [<ffffffff802422a1>] workqueue_cpu_callback+0x14e/0x2a4
[ 235.771836] [<ffffffff8048d2fa>] __mutex_lock_slowpath+0xea/0x272
[ 235.778990] [<ffffffff802422a1>] workqueue_cpu_callback+0x14e/0x2a4
[ 235.786317] [<ffffffff8023e932>] notifier_call_chain+0x23/0x32
[ 235.793211] [<ffffffff8023ea71>] blocking_notifier_call_chain+0x22/0x36
[ 235.800901] [<ffffffff8024ff00>] _cpu_down+0x53/0x218
[ 235.807014] [<ffffffff802500f0>] cpu_down+0x2b/0x42
[ 235.812955] [<ffffffff803b09c8>] store_online+0x27/0x71
[ 235.819234] [<ffffffff802b67b6>] sysfs_write_file+0xcc/0xfb
[ 235.825878] [<ffffffff8027bfb5>] vfs_write+0xb2/0x155
[ 235.831983] [<ffffffff8027c10d>] sys_write+0x45/0x70
[ 235.838002] [<ffffffff802099be>] system_call+0x7e/0x83
[ 235.844194] [<ffffffffffffffff>] 0xffffffffffffffff
[ 235.850148]
[ 235.850148] -> #0 ((cpu_chain).rwsem){----}:
[ 235.856145] [<ffffffff8024977b>] save_trace+0x48/0xdf
[ 235.862250] [<ffffffff80249e67>] print_circular_bug_tail+0x34/0x70
[ 235.869491] [<ffffffff8024bbc5>] __lock_acquire+0x9b7/0xc37
[ 235.876125] [<ffffffff8048ef0e>] _spin_unlock_irqrestore+0x49/0x52
[ 235.883374] [<ffffffff8024c0ff>] lock_acquire+0x5c/0x77
[ 235.889653] [<ffffffff8023ea62>] blocking_notifier_call_chain+0x13/0x36
[ 235.897334] [<ffffffff8048c89f>] cond_resched+0x2f/0x3a
[ 235.903613] [<ffffffff80247fbf>] down_read+0x37/0x40
[ 235.909642] [<ffffffff8023ea62>] blocking_notifier_call_chain+0x13/0x36
[ 235.917322] [<ffffffff80250004>] _cpu_down+0x157/0x218
[ 235.923532] [<ffffffff802500f0>] cpu_down+0x2b/0x42
[ 235.929464] [<ffffffff803b09c8>] store_online+0x27/0x71
[ 235.935752] [<ffffffff802b67b6>] sysfs_write_file+0xcc/0xfb
[ 235.942386] [<ffffffff8027bfb5>] vfs_write+0xb2/0x155
[ 235.948491] [<ffffffff8027c10d>] sys_write+0x45/0x70
[ 235.954520] [<ffffffff802099be>] system_call+0x7e/0x83
[ 235.960712] [<ffffffffffffffff>] 0xffffffffffffffff
[ 235.966645]
[ 235.966645] other info that might help us debug this:
[ 235.966646]
[ 235.974812] 2 locks held by bash/3223:
[ 235.978604] #0: (cpu_add_remove_lock){--..}, at: [<ffffffff802500de>] cpu_down+0x19/0x42
[ 235.987160] #1: (workqueue_mutex){--..}, at: [<ffffffff802422a1>] workqueue_cpu_callback+0x14e/0x2a4
[ 235.996765]
[ 235.996766] stack backtrace:
[ 236.001217]
[ 236.001218] Call Trace:
[ 236.005238] [<ffffffff80249e9a>] print_circular_bug_tail+0x67/0x70
[ 236.011543] [<ffffffff8024bbc5>] __lock_acquire+0x9b7/0xc37
[ 236.017241] [<ffffffff8048ef0e>] _spin_unlock_irqrestore+0x49/0x52
[ 236.023546] [<ffffffff8024c0ff>] lock_acquire+0x5c/0x77
[ 236.028898] [<ffffffff8023ea62>] blocking_notifier_call_chain+0x13/0x36
[ 236.035634] [<ffffffff8048c89f>] cond_resched+0x2f/0x3a
[ 236.040987] [<ffffffff80247fbf>] down_read+0x37/0x40
[ 236.046080] [<ffffffff8023ea62>] blocking_notifier_call_chain+0x13/0x36
[ 236.052818] [<ffffffff80250004>] _cpu_down+0x157/0x218
[ 236.058084] [<ffffffff802500f0>] cpu_down+0x2b/0x42
[ 236.063089] [<ffffffff803b09c8>] store_online+0x27/0x71
[ 236.068441] [<ffffffff802b67b6>] sysfs_write_file+0xcc/0xfb
[ 236.074140] [<ffffffff8027bfb5>] vfs_write+0xb2/0x155
[ 236.079319] [<ffffffff8027c10d>] sys_write+0x45/0x70
[ 236.084411] [<ffffffff802099be>] system_call+0x7e/0x83
[ 236.089677]
View attachment "dmesg" of type "text/plain" (23848 bytes)
View attachment ".config" of type "text/plain" (23499 bytes)
Powered by blists - more mailing lists