linux-kernel - Re: [RFC][PATCH 09/13] hotplug: Replace hotplug lock with percpu-rwsem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150624161524.GO3644@twins.programming.kicks-ass.net>
Date:	Wed, 24 Jun 2015 18:15:25 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	paulmck@...ux.vnet.ibm.com, tj@...nel.org, mingo@...hat.com,
	linux-kernel@...r.kernel.org, der.herr@...r.at, dave@...olabs.net,
	riel@...hat.com, viro@...IV.linux.org.uk,
	torvalds@...ux-foundation.org
Subject: Re: [RFC][PATCH 09/13] hotplug: Replace hotplug lock with
 percpu-rwsem

On Wed, Jun 24, 2015 at 05:12:12PM +0200, Oleg Nesterov wrote:
> On 06/24, Peter Zijlstra wrote:

> > I'm confused.. why isn't the read-in-read recursion good enough?
> 
> Because the code above can actually deadlock if 2 CPU's do this at
> the same time?

Hmm yes.. this makes the hotplug locking worse than I feared it was, but
alas.

FYI, the actual splat.

---

[    7.399737] ======================================================
[    7.406640] [ INFO: possible circular locking dependency detected ]
[    7.413643] 4.1.0-02756-ge3d06bd-dirty #185 Not tainted
[    7.419481] -------------------------------------------------------
[    7.426483] kworker/0:1/215 is trying to acquire lock:
[    7.432221]  (&cpu_hotplug.rwsem){++++++}, at: [<ffffffff810ebd63>] apply_workqueue_attrs+0x183/0x4b0
[    7.442564] 
[    7.442564] but task is already holding lock:
[    7.449079]  (&item->mutex){+.+.+.}, at: [<ffffffff815c4dc3>] drm_global_item_ref+0x33/0xe0
[    7.458455] 
[    7.458455] which lock already depends on the new lock.
[    7.458455] 
[    7.467591] 
[    7.467591] the existing dependency chain (in reverse order) is:
[    7.475949] 
-> #3 (&item->mutex){+.+.+.}:
[    7.480662]        [<ffffffff811232b1>] lock_acquire+0xd1/0x290
[    7.487280]        [<ffffffff818ea777>] mutex_lock_nested+0x47/0x3c0
[    7.494390]        [<ffffffff815c4dc3>] drm_global_item_ref+0x33/0xe0
[    7.501596]        [<ffffffff815dcd90>] mgag200_mm_init+0x50/0x1c0
[    7.508514]        [<ffffffff815d757f>] mgag200_driver_load+0x30f/0x500
[    7.515916]        [<ffffffff815b1491>] drm_dev_register+0xb1/0x100
[    7.522922]        [<ffffffff815b428d>] drm_get_pci_dev+0x8d/0x1e0
[    7.529840]        [<ffffffff815dbd3f>] mga_pci_probe+0x9f/0xc0
[    7.536463]        [<ffffffff814bde92>] local_pci_probe+0x42/0xa0
[    7.543283]        [<ffffffff810e54e8>] work_for_cpu_fn+0x18/0x30
[    7.550106]        [<ffffffff810e9d57>] process_one_work+0x1e7/0x7e0
[    7.557214]        [<ffffffff810ea518>] worker_thread+0x1c8/0x460
[    7.564029]        [<ffffffff810f05b6>] kthread+0xf6/0x110
[    7.570166]        [<ffffffff818eefdf>] ret_from_fork+0x3f/0x70
[    7.576792] 
-> #2 (drm_global_mutex){+.+.+.}:
[    7.581891]        [<ffffffff811232b1>] lock_acquire+0xd1/0x290
[    7.588514]        [<ffffffff818ea777>] mutex_lock_nested+0x47/0x3c0
[    7.595622]        [<ffffffff815b1406>] drm_dev_register+0x26/0x100
[    7.602632]        [<ffffffff815b428d>] drm_get_pci_dev+0x8d/0x1e0
[    7.609547]        [<ffffffff815dbd3f>] mga_pci_probe+0x9f/0xc0
[    7.616170]        [<ffffffff814bde92>] local_pci_probe+0x42/0xa0
[    7.622987]        [<ffffffff810e54e8>] work_for_cpu_fn+0x18/0x30
[    7.629806]        [<ffffffff810e9d57>] process_one_work+0x1e7/0x7e0
[    7.636913]        [<ffffffff810ea518>] worker_thread+0x1c8/0x460
[    7.643727]        [<ffffffff810f05b6>] kthread+0xf6/0x110
[    7.649866]        [<ffffffff818eefdf>] ret_from_fork+0x3f/0x70
[    7.656490] 
-> #1 ((&wfc.work)){+.+.+.}:
[    7.661104]        [<ffffffff811232b1>] lock_acquire+0xd1/0x290
[    7.667727]        [<ffffffff810e737d>] flush_work+0x3d/0x260
[    7.674155]        [<ffffffff810e9822>] work_on_cpu+0x82/0x90
[    7.680584]        [<ffffffff814bf2a2>] pci_device_probe+0x112/0x120
[    7.687692]        [<ffffffff815e685f>] driver_probe_device+0x17f/0x2e0
[    7.695094]        [<ffffffff815e6a94>] __driver_attach+0x94/0xa0
[    7.701910]        [<ffffffff815e4786>] bus_for_each_dev+0x66/0xa0
[    7.708824]        [<ffffffff815e626e>] driver_attach+0x1e/0x20
[    7.715447]        [<ffffffff815e5ed8>] bus_add_driver+0x168/0x210
[    7.722361]        [<ffffffff815e7880>] driver_register+0x60/0xe0
[    7.729180]        [<ffffffff814bd754>] __pci_register_driver+0x64/0x70
[    7.736580]        [<ffffffff81f9a10d>] pcie_portdrv_init+0x66/0x79
[    7.743593]        [<ffffffff810002c8>] do_one_initcall+0x88/0x1c0
[    7.750508]        [<ffffffff81f5f169>] kernel_init_freeable+0x1f5/0x282
[    7.758005]        [<ffffffff818da36e>] kernel_init+0xe/0xe0
[    7.764338]        [<ffffffff818eefdf>] ret_from_fork+0x3f/0x70
[    7.770961] 
-> #0 (&cpu_hotplug.rwsem){++++++}:
[    7.776255]        [<ffffffff81122817>] __lock_acquire+0x2207/0x2240
[    7.783363]        [<ffffffff811232b1>] lock_acquire+0xd1/0x290
[    7.789986]        [<ffffffff810cb6e2>] get_online_cpus+0x62/0xb0
[    7.796805]        [<ffffffff810ebd63>] apply_workqueue_attrs+0x183/0x4b0
[    7.804398]        [<ffffffff810ed7bc>] __alloc_workqueue_key+0x2ec/0x560
[    7.811992]        [<ffffffff815cbefa>] ttm_mem_global_init+0x5a/0x310
[    7.819295]        [<ffffffff815dcbb2>] mgag200_ttm_mem_global_init+0x12/0x20
[    7.827277]        [<ffffffff815c4df5>] drm_global_item_ref+0x65/0xe0
[    7.834481]        [<ffffffff815dcd90>] mgag200_mm_init+0x50/0x1c0
[    7.841395]        [<ffffffff815d757f>] mgag200_driver_load+0x30f/0x500
[    7.848793]        [<ffffffff815b1491>] drm_dev_register+0xb1/0x100
[    7.855804]        [<ffffffff815b428d>] drm_get_pci_dev+0x8d/0x1e0
[    7.862715]        [<ffffffff815dbd3f>] mga_pci_probe+0x9f/0xc0
[    7.869338]        [<ffffffff814bde92>] local_pci_probe+0x42/0xa0
[    7.876159]        [<ffffffff810e54e8>] work_for_cpu_fn+0x18/0x30
[    7.882979]        [<ffffffff810e9d57>] process_one_work+0x1e7/0x7e0
[    7.890087]        [<ffffffff810ea518>] worker_thread+0x1c8/0x460
[    7.896907]        [<ffffffff810f05b6>] kthread+0xf6/0x110
[    7.903043]        [<ffffffff818eefdf>] ret_from_fork+0x3f/0x70
[    7.909673] 
[    7.909673] other info that might help us debug this:
[    7.909673] 
[    7.918616] Chain exists of:
  &cpu_hotplug.rwsem --> drm_global_mutex --> &item->mutex

[    7.927907]  Possible unsafe locking scenario:
[    7.927907] 
[    7.934521]        CPU0                    CPU1
[    7.939580]        ----                    ----
[    7.944639]   lock(&item->mutex);
[    7.948359]                                lock(drm_global_mutex);
[    7.955292]                                lock(&item->mutex);
[    7.961855]   lock(&cpu_hotplug.rwsem);
[    7.966158] 
[    7.966158]  *** DEADLOCK ***
[    7.966158] 
[    7.972771] 4 locks held by kworker/0:1/215:
[    7.977539]  #0:  ("events"){.+.+.+}, at: [<ffffffff810e9cc6>] process_one_work+0x156/0x7e0
[    7.986929]  #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff810e9cc6>] process_one_work+0x156/0x7e0
[    7.996600]  #2:  (drm_global_mutex){+.+.+.}, at: [<ffffffff815b1406>] drm_dev_register+0x26/0x100
[    8.006690]  #3:  (&item->mutex){+.+.+.}, at: [<ffffffff815c4dc3>] drm_global_item_ref+0x33/0xe0
[    8.016559] 
[    8.016559] stack backtrace:
[    8.021427] CPU: 0 PID: 215 Comm: kworker/0:1 Not tainted 4.1.0-02756-ge3d06bd-dirty #185
[    8.030565] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[    8.042034] Workqueue: events work_for_cpu_fn
[    8.046909]  ffffffff82857e30 ffff88042b3437c8 ffffffff818e5189 0000000000000011
[    8.055216]  ffffffff8282aa40 ffff88042b343818 ffffffff8111ee76 0000000000000004
[    8.063522]  ffff88042b343888 ffff88042b33f040 0000000000000004 ffff88042b33f040
[    8.071827] Call Trace:
[    8.074559]  [<ffffffff818e5189>] dump_stack+0x4c/0x6e
[    8.080300]  [<ffffffff8111ee76>] print_circular_bug+0x1c6/0x220
[    8.087011]  [<ffffffff81122817>] __lock_acquire+0x2207/0x2240
[    8.093528]  [<ffffffff811232b1>] lock_acquire+0xd1/0x290
[    8.099559]  [<ffffffff810ebd63>] ? apply_workqueue_attrs+0x183/0x4b0
[    8.106755]  [<ffffffff810cb6e2>] get_online_cpus+0x62/0xb0
[    8.112981]  [<ffffffff810ebd63>] ? apply_workqueue_attrs+0x183/0x4b0
[    8.120176]  [<ffffffff810ead27>] ? alloc_workqueue_attrs+0x27/0x80
[    8.127178]  [<ffffffff810ebd63>] apply_workqueue_attrs+0x183/0x4b0
[    8.134182]  [<ffffffff8111cc21>] ? debug_mutex_init+0x31/0x40
[    8.140690]  [<ffffffff810ed7bc>] __alloc_workqueue_key+0x2ec/0x560
[    8.147691]  [<ffffffff815cbefa>] ttm_mem_global_init+0x5a/0x310
[    8.154405]  [<ffffffff8122b050>] ? __kmalloc+0x5e0/0x630
[    8.160435]  [<ffffffff815c4de2>] ? drm_global_item_ref+0x52/0xe0
[    8.167243]  [<ffffffff815dcbb2>] mgag200_ttm_mem_global_init+0x12/0x20
[    8.174631]  [<ffffffff815c4df5>] drm_global_item_ref+0x65/0xe0
[    8.181245]  [<ffffffff815dcd90>] mgag200_mm_init+0x50/0x1c0
[    8.187570]  [<ffffffff815d757f>] mgag200_driver_load+0x30f/0x500
[    8.194383]  [<ffffffff815b1491>] drm_dev_register+0xb1/0x100
[    8.200802]  [<ffffffff815b428d>] drm_get_pci_dev+0x8d/0x1e0
[    8.207125]  [<ffffffff818ebf9e>] ? mutex_unlock+0xe/0x10
[    8.213156]  [<ffffffff815dbd3f>] mga_pci_probe+0x9f/0xc0
[    8.219187]  [<ffffffff814bde92>] local_pci_probe+0x42/0xa0
[    8.225412]  [<ffffffff8111db81>] ? __lock_is_held+0x51/0x80
[    8.231736]  [<ffffffff810e54e8>] work_for_cpu_fn+0x18/0x30
[    8.237962]  [<ffffffff810e9d57>] process_one_work+0x1e7/0x7e0
[    8.244477]  [<ffffffff810e9cc6>] ? process_one_work+0x156/0x7e0
[    8.251187]  [<ffffffff810ea518>] worker_thread+0x1c8/0x460
[    8.257410]  [<ffffffff810ea350>] ? process_one_work+0x7e0/0x7e0
[    8.264120]  [<ffffffff810ea350>] ? process_one_work+0x7e0/0x7e0
[    8.270829]  [<ffffffff810f05b6>] kthread+0xf6/0x110
[    8.276375]  [<ffffffff818ee230>] ? _raw_spin_unlock_irq+0x30/0x60
[    8.283282]  [<ffffffff810f04c0>] ? kthread_create_on_node+0x220/0x220
[    8.290566]  [<ffffffff818eefdf>] ret_from_fork+0x3f/0x70
[    8.296597]  [<ffffffff810f04c0>] ? kthread_create_on_node+0x220/0x220
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/