[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AF7ACCC.2050208@kernel.org>
Date: Mon, 09 Nov 2009 14:46:52 +0900
From: Tejun Heo <tj@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
CC: Christoph Lameter <cl@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>, Nick Piggin <npiggin@...e.de>,
Jiri Kosina <jkosina@...e.cz>,
Yinghai Lu <yhlu.kernel@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org
Subject: [PATCH percpu#for-linus] percpu: fix possible deadlock via irq lock
inversion
Lockdep caught the following irq lock inversion which can lead to
deadlock.
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.32-rc5-tip-04815-g12f0f93-dirty #745
---------------------------------------------------------
hub 1-3:1.0: state 7 ports 2 chg 0000 evt 0004
ksoftirqd/65/199 just changed the state of lock:
(pcpu_lock){..-...}, at: [<ffffffff81130e04>] free_percpu+0x38/0x104
but this lock took another, SOFTIRQ-unsafe lock in the past:
(vmap_area_lock){+.+...}
and interrupts could create inverse lock ordering between them.
This happens because pcpu_lock is allowed to be acquired from irq
context for free_percpu() path and alloc_percpu() path may call
vfree() with pcpu_lock held. As vmap_area_lock isn't irq safe, if an
IRQ occurs while vmap_area_lock is held and the irq handler calls
free_percpu(), locking order inversion occurs.
As the nesting only occurs in the alloc path which isn't allowed to be
called from irq context, A->B->A deadlock won't occur but A->B B->A
deadlock is still possible.
The only place where vmap_area_lock is nested under pcpu_lock is while
extending area_map to free old map. Break the locking order by
temporarily releasing pcpu_lock when freeing old map. This is safe to
do as allocation path is protected with outer pcpu_alloc_mutex.
Signed-off-by: Tejun Heo <tj@...nel.org>
Reported-by: Yinghai Lu <yhlu.kernel@...il.com>
Cc: Ingo Molnar <mingo@...e.hu>
---
If nobody objects, I'll push it to Linus tomorrow. Thanks.
mm/percpu.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/mm/percpu.c b/mm/percpu.c
index d907971..30cd343 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -372,7 +372,7 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
{
int new_alloc;
- int *new;
+ int *new, *old = NULL;
size_t size;
/* has enough? */
@@ -407,10 +407,23 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
* one of the first chunks and still using static map.
*/
if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
- pcpu_mem_free(chunk->map, size);
+ old = chunk->map;
chunk->map_alloc = new_alloc;
chunk->map = new;
+
+ /*
+ * pcpu_mem_free() might end up calling vfree() which uses
+ * IRQ-unsafe lock and thus can't be called with pcpu_lock
+ * held. Release and reacquire pcpu_lock if old map needs to
+ * be freed.
+ */
+ if (old) {
+ spin_unlock_irqrestore(&pcpu_lock, *flags);
+ pcpu_mem_free(old, size);
+ spin_lock_irqsave(&pcpu_lock, *flags);
+ }
+
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists