lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 6 Mar 2013 10:43:36 +0800
From:	Ming Lei <ming.lei@...onical.com>
To:	Or Gerlitz <ogerlitz@...lanox.com>
Cc:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	David Miller <davem@...emloft.net>,
	Roland Dreier <roland@...nel.org>,
	netdev <netdev@...r.kernel.org>, Yan Burman <yanb@...lanox.com>,
	Jack Morgenstein <jackm@....mellanox.co.il>
Subject: Re: hitting lockdep warning as of too early VF probe with 3.9-rc1

On Tue, Mar 5, 2013 at 11:21 PM, Or Gerlitz <ogerlitz@...lanox.com> wrote:
> Hi Ming, Greg, Roland, Dave, all..
>
> With 3.9-rc1, we are hitting the below lockdep with probing of virtual
> functions over the mlx4 driver, where it seems that the probing of the VF
> starts before the PF initialization is done.
>
> Yan Burman from our team bisected that to be introduced by commit
> 190888ac01d059e38ffe77a2291d44cafa9016fb
> "driver core: fix possible missing of device probe".

I guess that the lockdep warning still can be triggered without the commit.

> Basically what happens is that the VF probe fails, and once the PF
> probing/initialization is done, the VF
> is probed again and this time it succeeds.
>
> Anything here which people see to be possibly wrong with the mlx4_core
> (drivers/net/ethernet/mellanox/mlx4) driver interaction with the PCI
> subsystem?
>
> Or.
>
>
> mlx4_core: Initializing 0000:04:00.0
> mlx4_core 0000:04:00.0: Enabling SR-IOV with 1 VFs
> pci 0000:04:00.1: [15b3:1004] type 00 class 0x028000
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.9.0-rc1 #96 Not tainted
> ---------------------------------------------
> kworker/0:1/734 is trying to acquire lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81066cb0>] flush_work+0x0/0x250
>
> but task is already holding lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81064352>]
> process_one_work+0x162/0x4c0
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock((&wfc.work));
>   lock((&wfc.work));
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 3 locks held by kworker/0:1/734:
>  #0:  (events){.+.+.+}, at: [<ffffffff81064352>]
> process_one_work+0x162/0x4c0
>  #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff81064352>]
> process_one_work+0x162/0x4c0
>  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff812db225>]
> device_attach+0x25/0xb0
>
> stack backtrace:
> Pid: 734, comm: kworker/0:1 Not tainted 3.9.0-rc1 #96
> Call Trace:
>  [<ffffffff810948ec>] validate_chain+0xdcc/0x11f0
>  [<ffffffff81095150>] __lock_acquire+0x440/0xc70
>  [<ffffffff81095150>] ? __lock_acquire+0x440/0xc70
>  [<ffffffff810959da>] lock_acquire+0x5a/0x70
>  [<ffffffff81066cb0>] ? wq_worker_waking_up+0x60/0x60
>  [<ffffffff81066cf5>] flush_work+0x45/0x250
>  [<ffffffff81066cb0>] ? wq_worker_waking_up+0x60/0x60
>  [<ffffffff810922be>] ? mark_held_locks+0x9e/0x130
>  [<ffffffff81066a96>] ? queue_work_on+0x46/0x90
>  [<ffffffff810925dd>] ? trace_hardirqs_on_caller+0xfd/0x190
>  [<ffffffff8109267d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81066f74>] work_on_cpu+0x74/0x90
>  [<ffffffff81063820>] ? keventd_up+0x20/0x20
>  [<ffffffff8121fd30>] ? pci_pm_prepare+0x60/0x60
>  [<ffffffff811f9293>] ? cpumask_next_and+0x23/0x40
>  [<ffffffff81220a1a>] pci_device_probe+0xba/0x110
>  [<ffffffff812dadca>] ? driver_sysfs_add+0x7a/0xb0
>  [<ffffffff812daf1f>] driver_probe_device+0x8f/0x230
>  [<ffffffff812db170>] ? __driver_attach+0xb0/0xb0
>  [<ffffffff812db1bb>] __device_attach+0x4b/0x60
>  [<ffffffff812d9314>] bus_for_each_drv+0x64/0x90
>  [<ffffffff812db298>] device_attach+0x98/0xb0
>  [<ffffffff81218474>] pci_bus_add_device+0x24/0x50
>  [<ffffffff81232e80>] virtfn_add+0x240/0x3e0

You are adding one new PCI device inside another PCI device's probe(),
so the new device will be probed, since PCI probe() is scheduled by
work_on_cpu, then cause flush_work() called inside worker function,
which might be a real deadlock.

I am wondering why this commit can cause the problem, since the PCI
device will be probed with its driver if there is one driver for it. There is no
any limit on when the driver should be loaded into system, either before
device is added or after.

>From driver core view, looks no wrong things are found.

>  [<ffffffff8146ce3d>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
>  [<ffffffff812333be>] pci_enable_sriov+0x23e/0x500
>  [<ffffffffa011fa1a>] __mlx4_init_one+0x5da/0xce0 [mlx4_core]
>  [<ffffffffa012016d>] mlx4_init_one+0x2d/0x60 [mlx4_core]
>  [<ffffffff8121fd79>] local_pci_probe+0x49/0x80
>  [<ffffffff81063833>] work_for_cpu_fn+0x13/0x20
>  [<ffffffff810643b8>] process_one_work+0x1c8/0x4c0
>  [<ffffffff81064352>] ? process_one_work+0x162/0x4c0
>  [<ffffffff81064cfb>] worker_thread+0x30b/0x430
>  [<ffffffff810649f0>] ? manage_workers+0x340/0x340
>  [<ffffffff8106cea6>] kthread+0xd6/0xe0
>  [<ffffffff8106cdd0>] ? __init_kthread_worker+0x70/0x70
>  [<ffffffff8146daac>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8106cdd0>] ? __init_kthread_worker+0x70/0x70
> mlx4_core: Initializing 0000:04:00.1
> mlx4_core 0000:04:00.1: enabling device (0000 -> 0002)
> mlx4_core 0000:04:00.1: Detected virtual function - running in slave mode
> mlx4_core 0000:04:00.1: Sending reset
> mlx4_core 0000:04:00.1: Got slave FLRed from Communication channel (ret:0x1)
> mlx4_core 0000:04:00.1: slave is currently in themiddle of FLR.
> retrying...(try num:1)
> mlx4_core 0000:04:00.1: Communication channel is not idle.my toggle is 1
> (cmd:0x0)
> mlx4_core 0000:04:00.1: slave is currently in themiddle of FLR.
> retrying...(try num:2)
> [... repeated the same ...]
> mlx4_core 0000:04:00.1: slave is currently in themiddle of FLR.
> retrying...(try num:10)
> mlx4_core 0000:04:00.1: Communication channel is not idle.my toggle is 1
> (cmd:0x0)
> mlx4_core 0000:04:00.1: slave driver version is not supported by the master
> mlx4_core 0000:04:00.1: Communication channel is not idle.my toggle is 1
> (cmd:0x0)
> mlx4_core 0000:04:00.1: Failed to initialize slave
> mlx4_core: probe of 0000:04:00.1 failed with error -5
> mlx4_core 0000:04:00.0: Running in master mode
> mlx4_core 0000:04:00.0: FW version 2.11.500 (cmd intf rev 3), max commands
> 16
> mlx4_core 0000:04:00.0: Catastrophic error buffer at 0x1f020, size 0x10, BAR
> 0
> mlx4_core 0000:04:00.0: Communication vector bar:2 offset:0x800
> [... probing of PF continues ...]
> mlx4_core 0000:04:00.0: Started init_resource_tracker: 80 slaves
> mlx4_core 0000:04:00.0: irq 83 for MSI/MSI-X
> mlx4_core 0000:04:00.0: irq 84 for MSI/MSI-X
> mlx4_core 0000:04:00.0: irq 85 for MSI/MSI-X
> mlx4_core 0000:04:00.0: irq 86 for MSI/MSI-X
> mlx4_core 0000:04:00.0: NOP command IRQ test passed
> [... probing of PF ends ...]
> [... probing of VF done again ...]
> mlx4_core: Initializing 0000:04:00.1
> mlx4_core 0000:04:00.1: enabling device (0000 -> 0002)
> mlx4_core 0000:04:00.1: Detected virtual function - running in slave mode
> mlx4_core 0000:04:00.1: Sending reset
> mlx4_core 0000:04:00.0: Received reset from slave:1
> mlx4_core 0000:04:00.1: Sending vhcr0
> [... probing of VF succeeds ...]
> mlx4_core 0000:04:00.1: NOP command IRQ test passed
>
>
>
>

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ