lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 09 Nov 2009 14:46:52 +0900
From:	Tejun Heo <>
To:	Peter Zijlstra <>
CC:	Christoph Lameter <>,
	Ingo Molnar <>, Nick Piggin <>,
	Jiri Kosina <>,
	Yinghai Lu <>,
	Thomas Gleixner <>,
Subject: [PATCH percpu#for-linus] percpu: fix possible deadlock via irq lock

Lockdep caught the following irq lock inversion which can lead to

 [ INFO: possible irq lock inversion dependency detected ]
 2.6.32-rc5-tip-04815-g12f0f93-dirty #745
 hub 1-3:1.0: state 7 ports 2 chg 0000 evt 0004
 ksoftirqd/65/199 just changed the state of lock:
  (pcpu_lock){..-...}, at: [<ffffffff81130e04>] free_percpu+0x38/0x104
 but this lock took another, SOFTIRQ-unsafe lock in the past:

 and interrupts could create inverse lock ordering between them.

This happens because pcpu_lock is allowed to be acquired from irq
context for free_percpu() path and alloc_percpu() path may call
vfree() with pcpu_lock held.  As vmap_area_lock isn't irq safe, if an
IRQ occurs while vmap_area_lock is held and the irq handler calls
free_percpu(), locking order inversion occurs.

As the nesting only occurs in the alloc path which isn't allowed to be
called from irq context, A->B->A deadlock won't occur but A->B B->A
deadlock is still possible.

The only place where vmap_area_lock is nested under pcpu_lock is while
extending area_map to free old map.  Break the locking order by
temporarily releasing pcpu_lock when freeing old map.  This is safe to
do as allocation path is protected with outer pcpu_alloc_mutex.

Signed-off-by: Tejun Heo <>
Reported-by: Yinghai Lu <>
Cc: Ingo Molnar <>
If nobody objects, I'll push it to Linus tomorrow.  Thanks.

 mm/percpu.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index d907971..30cd343 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -372,7 +372,7 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
 static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 	int new_alloc;
-	int *new;
+	int *new, *old = NULL;
 	size_t size;

 	/* has enough? */
@@ -407,10 +407,23 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 	 * one of the first chunks and still using static map.
 	if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
-		pcpu_mem_free(chunk->map, size);
+		old = chunk->map;

 	chunk->map_alloc = new_alloc;
 	chunk->map = new;
+	/*
+	 * pcpu_mem_free() might end up calling vfree() which uses
+	 * IRQ-unsafe lock and thus can't be called with pcpu_lock
+	 * held.  Release and reacquire pcpu_lock if old map needs to
+	 * be freed.
+	 */
+	if (old) {
+		spin_unlock_irqrestore(&pcpu_lock, *flags);
+		pcpu_mem_free(old, size);
+		spin_lock_irqsave(&pcpu_lock, *flags);
+	}
 	return 0;

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists