lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 19 Jul 2013 09:47:39 +0800 From: Lai Jiangshan <laijs@...fujitsu.com> To: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com> CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>, "Rafael J. Wysocki" <rjw@...k.pl>, bhelgaas@...gle.com Subject: Re: workqueue, pci: INFO: possible recursive locking detected On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote: > > On 07/17/2013 03:37 PM, Lai Jiangshan wrote: >> On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote: >>> Hi, >>> >>> I have been seeing this warning every time during boot. I haven't >>> spent time digging through it though... Please let me know if >>> any machine-specific info is needed. >>> >>> Regards, >>> Srivatsa S. Bhat >>> >>> >>> ---------------------------------------------------- >>> >>> ============================================= >>> [ INFO: possible recursive locking detected ] >>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted >>> --------------------------------------------- >>> kworker/0:1/142 is trying to acquire lock: >>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0 >>> >>> but task is already holding lock: >>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610 >>> >>> other info that might help us debug this: >>> Possible unsafe locking scenario: >>> >>> CPU0 >>> ---- >>> lock((&wfc.work)); >>> lock((&wfc.work)); >> >> > > > Hi Lai, > > Thanks for taking a look into this! > >> >> This is false negative, > > I believe you meant false-positive... > >> the two "wfc"s are different, they are >> both on stack. flush_work() can't be deadlock in such case: >> >> void foo(void *) >> { >> ... >> if (xxx) >> work_on_cpu(..., foo, ...); >> ... >> } >> >> bar() >> { >> work_on_cpu(..., foo, ...); >> } >> >> The complaint is caused by "work_on_cpu() uses a static lock_class_key". >> we should fix work_on_cpu(). >> (but the caller should also be careful, the foo()/local_pci_probe() is re-entering) >> >> But I can't find an elegant fix. >> >> long work_on_cpu(int cpu, long (*fn)(void *), void *arg) >> { >> struct work_for_cpu wfc = { .fn = fn, .arg = arg }; >> >> +#ifdef CONFIG_LOCKDEP >> + static struct lock_class_key __key; >> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn); >> + lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0); >> +#else >> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn); >> +#endif >> schedule_work_on(cpu, &wfc.work); >> flush_work(&wfc.work); >> return wfc.ret; >> } >> > > Unfortunately that didn't seem to fix it.. I applied the patch > shown below, and I got the same old warning. > > --- > > kernel/workqueue.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index f02c4a4..07d9a67 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg) > { > struct work_for_cpu wfc = { .fn = fn, .arg = arg }; > > +#ifdef CONFIG_LOCKDEP > + static struct lock_class_key __key; Sorry, this "static" should be removed. Thanks, Lai > + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn); > + lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0); > +#else > INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn); > +#endif > schedule_work_on(cpu, &wfc.work); > flush_work(&wfc.work); > return wfc.ret; > > > > Warning: > -------- > > wmi: Mapper loaded > be2net 0000:11:00.0: irq 102 for MSI/MSI-X > be2net 0000:11:00.0: enabled 1 MSI-x vector(s) > be2net 0000:11:00.0: created 0 RSS queue(s) and 1 default RX queue > be2net 0000:11:00.0: created 1 TX queue(s) > pci 0000:11:04.0: [19a2:0710] type 00 class 0x020000 > > ============================================= > [ INFO: possible recursive locking detected ] > 3.11.0-rc1-wq-fix #10 Not tainted > --------------------------------------------- > kworker/0:1/126 is trying to acquire lock: > (&wfc.work){+.+.+.}, at: [<ffffffff810770f0>] flush_work+0x0/0xb0 > > but task is already holding lock: > (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&wfc.work); > lock(&wfc.work); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by kworker/0:1/126: > #0: (events){.+.+.+}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610 > #1: (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610 > #2: (&__lockdep_no_validate__){......}, at: [<ffffffff81398ada>] device_attach+0x2a/0xc0 > > stack backtrace: > CPU: 0 PID: 126 Comm: kworker/0:1 Not tainted 3.11.0-rc1-wq-fix #10 > Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012 > Workqueue: events work_for_cpu_fn > ffff881036887408 ffff881036889668 ffffffff81619059 0000000000000003 > ffff881036886a80 ffff881036889698 ffffffff810c1624 ffff881036886a80 > ffff881036887408 ffff881036886a80 0000000000000000 ffff8810368896f8 > Call Trace: > [<ffffffff81619059>] dump_stack+0x59/0x80 > [<ffffffff810c1624>] print_deadlock_bug+0xf4/0x100 > [<ffffffff810c3104>] validate_chain+0x504/0x750 > [<ffffffff810c365d>] __lock_acquire+0x30d/0x580 > [<ffffffff810c3967>] lock_acquire+0x97/0x170 > [<ffffffff810770f0>] ? start_flush_work+0x220/0x220 > [<ffffffff81077138>] flush_work+0x48/0xb0 > [<ffffffff810770f0>] ? start_flush_work+0x220/0x220 > [<ffffffff810c2000>] ? mark_held_locks+0x80/0x130 > [<ffffffff81074ceb>] ? queue_work_on+0x4b/0xa0 > [<ffffffff810c2375>] ? trace_hardirqs_on_caller+0x105/0x1d0 > [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10 > [<ffffffff81077334>] work_on_cpu+0xa4/0xc0 > [<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190 > [<ffffffff812d1ed0>] ? pci_pm_prepare+0x60/0x60 > [<ffffffff812d1ffa>] __pci_device_probe+0x9a/0xe0 > [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50 > [<ffffffff812d3302>] ? pci_dev_get+0x22/0x30 > [<ffffffff812d334a>] pci_device_probe+0x3a/0x60 > [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50 > [<ffffffff81398bdc>] really_probe+0x6c/0x320 > [<ffffffff81398ed7>] driver_probe_device+0x47/0xa0 > [<ffffffff81398fe0>] ? __driver_attach+0xb0/0xb0 > [<ffffffff81399033>] __device_attach+0x53/0x60 > [<ffffffff81396b24>] bus_for_each_drv+0x74/0xa0 > [<ffffffff81398b50>] device_attach+0xa0/0xc0 > [<ffffffff812c99f9>] pci_bus_add_device+0x39/0x60 > [<ffffffff812ed341>] virtfn_add+0x251/0x3e0 > [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10 > [<ffffffff812ed9bf>] sriov_enable+0x22f/0x3d0 > [<ffffffff812edbad>] pci_enable_sriov+0x4d/0x60 > [<ffffffffa0127045>] be_vf_setup+0x175/0x410 [be2net] > [<ffffffffa012d3ca>] be_setup+0x37a/0x4b0 [be2net] > [<ffffffffa012dac0>] be_probe+0x5c0/0x820 [be2net] > [<ffffffff812d1f1e>] local_pci_probe+0x4e/0x90 > [<ffffffff8106f958>] work_for_cpu_fn+0x18/0x30 > [<ffffffff81075e3a>] process_one_work+0x1da/0x610 > [<ffffffff81075dc9>] ? process_one_work+0x169/0x610 > [<ffffffff810764fc>] worker_thread+0x28c/0x3a0 > [<ffffffff81076270>] ? process_one_work+0x610/0x610 > [<ffffffff8107da5e>] kthread+0xee/0x100 > [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70 > [<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0 > [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70 > be2net 0000:11:04.0: enabling device (0040 -> 0042) > be2net 0000:11:04.0: Could not use PCIe error reporting > be2net 0000:11:04.0: VF is not privileged to issue opcode 89-1 > be2net 0000:11:04.0: VF is not privileged to issue opcode 125-1 > > > Regards, > Srivatsa S. Bhat > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists