[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <516AA80F.7040505@mellanox.com>
Date: Sun, 14 Apr 2013 15:58:55 +0300
From: Or Gerlitz <ogerlitz@...lanox.com>
To: Tejun Heo <tj@...nel.org>
CC: "Michael S. Tsirkin" <mst@...hat.com>,
Ming Lei <ming.lei@...onical.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
David Miller <davem@...emloft.net>,
Roland Dreier <roland@...nel.org>,
netdev <netdev@...r.kernel.org>, Yan Burman <yanb@...lanox.com>,
Jack Morgenstein <jackm@....mellanox.co.il>,
Bjorn Helgaas <bhelgaas@...gle.com>,
<linux-pci@...r.kernel.org>
Subject: Re: [PATCH repost for-3.9] pci: avoid work_on_cpu for nested SRIOV
probes
On 11/04/2013 23:41, Tejun Heo wrote:
> Hello,
>
> On Thu, Apr 11, 2013 at 11:30:53PM +0300, Michael S. Tsirkin wrote:
>> Okay, so you are saying it's a false-positive?
> Yeah, I think so. It didn't actually lock up, right? It it did,
> our analysis upto this point is likely to be completely wrong.
>
>> Want to send a patch so Or can try it out?
> Hmmm... something like the following on the workqueue side (completely untested).
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 8afab27..899d470 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -466,14 +466,21 @@ static inline bool __deprecated flush_delayed_work_sync(struct delayed_work *dwo
> }
>
> #ifndef CONFIG_SMP
> -static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +static inline long work_on_cpu_nested(unsigned int cpu, long (*fn)(void *),
> + void *arg, int subclass)
> {
> return fn(arg);
> }
> #else
> -long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
> +long work_on_cpu_nested(unsigned int cpu, long (*fn)(void *), void *arg,
> + int subclass);
> #endif /* CONFIG_SMP */
>
> +static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> + return work_on_cpu_nested(cpu, fn, arg, 0);
> +}
> +
> #ifdef CONFIG_FREEZER
> extern void freeze_workqueues_begin(void);
> extern bool freeze_workqueues_busy(void);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 81f2457..c2be670 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3555,25 +3555,30 @@ static void work_for_cpu_fn(struct work_struct *work)
> }
>
> /**
> - * work_on_cpu - run a function in user context on a particular cpu
> + * work_on_cpu_nested - run a function in user context on a particular cpu
> * @cpu: the cpu to run on
> * @fn: the function to run
> * @arg: the function arg
> + * @subclass: lockdep subclass
> *
> * This will return the value @fn returns.
> * It is up to the caller to ensure that the cpu doesn't go offline.
> * The caller must not hold any locks which would prevent @fn from completing.
> + *
> + * XXX: explain @subclass
> */
> -long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +long work_on_cpu_nested(unsigned int cpu, long (*fn)(void *), void *arg,
> + int subclass)
> {
> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>
> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> + lock_set_subclass(&wfc.work.lockdep_map, subclass, _RET_IP_);
> schedule_work_on(cpu, &wfc.work);
> flush_work(&wfc.work);
> return wfc.ret;
> }
> -EXPORT_SYMBOL_GPL(work_on_cpu);
> +EXPORT_SYMBOL_GPL(work_on_cpu_nested);
> #endif /* CONFIG_SMP */
>
> #ifdef CONFIG_FREEZER
Hi,
So the patch eliminated the lockdep warning for mlx4 nested probing
sequence, but introduced lockdep warning for
00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC
Interrupt Controller (rev 22)
... see below the lockdep output and the lspci listings, attached is the
full boot sequence dmesg
and my .config - this I was running against the net git as of commit
2e0cbf2cc2c9371f0aa198857d799175ffe231a6
"net: mvmdio: add select PHYLIB"
From quick tests - the system is operative with the patch as it was
without it - e.g mlx4 VFs probed on the host
is working OK and also the 1g Intel NIC.
We have holiday here Mon/Tues, so I will be able to test further patches Wed
Or.
=====================================
[ BUG: bad unlock balance detected! ]
3.9.0-rc6+ #53 Not tainted
-------------------------------------
swapper/0/1 is trying to release lock ((&wfc.work)) at:
[<ffffffff8122014c>] pci_device_probe+0xfc/0x120
but there are no more locks to release!
other info that might help us debug this:
2 locks held by swapper/0/1:
#0: (&__lockdep_no_validate__){......}, at: [<ffffffff812da443>]
__driver_attach+0x53/0xb0
#1: (&__lockdep_no_validate__){......}, at: [<ffffffff812da451>]
__driver_attach+0x61/0xb0
stack backtrace:
Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc6+ #53
Call Trace:
[<ffffffff8122014c>] ? pci_device_probe+0xfc/0x120
[<ffffffff81093529>] print_unlock_imbalance_bug+0xf9/0x100
[<ffffffff8109616f>] lock_set_class+0x27f/0x7c0
[<ffffffff81091d9e>] ? mark_held_locks+0x9e/0x130
[<ffffffff8122014c>] ? pci_device_probe+0xfc/0x120
[<ffffffff81066aeb>] work_on_cpu_nested+0x8b/0xc0
[<ffffffff810633c0>] ? keventd_up+0x20/0x20
[<ffffffff8121f420>] ? pci_pm_prepare+0x60/0x60
[<ffffffff8122014c>] pci_device_probe+0xfc/0x120
[<ffffffff812da0fa>] ? driver_sysfs_add+0x7a/0xb0
[<ffffffff812da24f>] driver_probe_device+0x8f/0x230
[<ffffffff812da493>] __driver_attach+0xa3/0xb0
[<ffffffff812da3f0>] ? driver_probe_device+0x230/0x230
[<ffffffff812da3f0>] ? driver_probe_device+0x230/0x230
[<ffffffff812d86fc>] bus_for_each_dev+0x8c/0xb0
[<ffffffff812da079>] driver_attach+0x19/0x20
[<ffffffff812d91a0>] bus_add_driver+0x1f0/0x250
[<ffffffff818bd596>] ? dmi_pcie_pme_disable_msi+0x21/0x21
[<ffffffff812daadf>] driver_register+0x6f/0x150
[<ffffffff818bd596>] ? dmi_pcie_pme_disable_msi+0x21/0x21
[<ffffffff8122026f>] __pci_register_driver+0x5f/0x70
[<ffffffff818bd5ff>] pcie_portdrv_init+0x69/0x7a
[<ffffffff810001fd>] do_one_initcall+0x3d/0x170
[<ffffffff81895943>] kernel_init_freeable+0x10d/0x19c
[<ffffffff818959d2>] ? kernel_init_freeable+0x19c/0x19c
[<ffffffff8145a040>] ? rest_init+0x160/0x160
[<ffffffff8145a049>] kernel_init+0x9/0xf0
[<ffffffff8146ca6c>] ret_from_fork+0x7c/0xb0
[<ffffffff8145a040>] ? rest_init+0x160/0x160
ioapic: probe of 0000:00:13.0 failed with error -22
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
intel_idle: MWAIT substates: 0x1120
intel_idle: v0.4 model 0x2C
intel_idle: lapic_timer_reliable_states 0xffffffff
ACPI: Requesting acpi_cpufreq
ERST: Failed to get Error Log Address Range.
# lspci
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Root Port 1 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Root Port 3 (rev 22)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root
Port 5 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Root Port 7 (rev 22)
00:09.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI
Express Root Port 9 (rev 22)
00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC
Interrupt Controller (rev 22)
00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System
Management Registers (rev 22)
00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and
Scratch Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status
and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Throttle
Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #4
00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #5
00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #6
00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #2
00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #1
00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #2
00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #3
00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface
Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port
SATA IDE Controller #1
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port
SATA IDE Controller #2
01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
04:00.0 Network controller: Mellanox Technologies MT27500 Family
[ConnectX-3]
04:00.1 Network controller: Mellanox Technologies MT27500 Family
[ConnectX-3 Virtual Function]
04:00.2 Network controller: Mellanox Technologies MT27500 Family
[ConnectX-3 Virtual Function]
04:00.3 Network controller: Mellanox Technologies MT27500 Family
[ConnectX-3 Virtual Function]
05:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
05:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
07:01.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA
G200eW WPCM450 (rev 0a)
fe:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture Generic Non-core Registers (rev 02)
fe:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture System Address Decoder (rev 02)
fe:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
fe:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0
(rev 02)
fe:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
0 (rev 02)
fe:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
1 (rev 02)
fe:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
fe:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1
(rev 02)
fe:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Registers (rev 02)
fe:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Target Address Decoder (rev 02)
fe:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller RAS Registers (rev 02)
fe:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Test Registers (rev 02)
fe:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Control (rev 02)
fe:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Address (rev 02)
fe:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Rank (rev 02)
fe:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Thermal Control (rev 02)
fe:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Control (rev 02)
fe:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Address (rev 02)
fe:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Rank (rev 02)
fe:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Thermal Control (rev 02)
fe:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Control (rev 02)
fe:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Address (rev 02)
fe:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Rank (rev 02)
fe:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Thermal Control (rev 02)
ff:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture Generic Non-core Registers (rev 02)
ff:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture System Address Decoder (rev 02)
ff:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
ff:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0
(rev 02)
ff:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
0 (rev 02)
ff:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
1 (rev 02)
ff:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
ff:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1
(rev 02)
ff:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Registers (rev 02)
ff:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Target Address Decoder (rev 02)
ff:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller RAS Registers (rev 02)
ff:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Test Registers (rev 02)
ff:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Control (rev 02)
ff:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Address (rev 02)
ff:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Rank (rev 02)
ff:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Thermal Control (rev 02)
ff:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Control (rev 02)
ff:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Address (rev 02)
ff:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Rank (rev 02)
ff:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Thermal Control (rev 02)
ff:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Control (rev 02)
ff:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Address (rev 02)
ff:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Rank (rev 02)
ff:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Thermal Control (rev 02)
View attachment "dmesg-net-2e0cbf2-tejun-patched" of type "text/plain" (71083 bytes)
View attachment "config-net-2e0cbf2" of type "text/plain" (77585 bytes)
Powered by blists - more mailing lists