[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1321248853.22361.280.camel@sli10-conroe>
Date: Mon, 14 Nov 2011 13:34:13 +0800
From: Shaohua Li <shaohua.li@...el.com>
To: Christoph Lameter <cl@...ux.com>
Cc: Julie Sullivan <kernelmail.jms@...il.com>,
Yong Zhang <yong.zhang0@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Pekka Enberg <penberg@...nel.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: INFO: possible recursive locking detected: get_partial_node()
on 3.2-rc1
On Fri, 2011-11-11 at 23:02 +0800, Christoph Lameter wrote:
> On Fri, 11 Nov 2011, Shaohua Li wrote:
>
> > Looks this could be a real dead lock. we hold a lock to free a object,
> > but the free need allocate a new object. if the new object and the freed
> > object are from the same slab, there is a deadlock.
>
> unfreeze partials is never called when going through get_partial_node()
> so there is no deadlock AFAICT.
the unfreeze_partial isn't called from get_partial_node(). I thought the
code path is something like this: kmem_cache_free()->put_cpu_partial()
(hold lock) ->unfreeze_partials() ->discard_slab ->debug_object_init()
->kmem_cache_alloc->get_partial_node()(hold lock). Not sure if this will
really happen, but looks like a deadlock.
But anyway, discard_slab() can be move out of unfreeze_partials()
> > discard_slab() doesn't need hold the lock if the slab is already removed
> > from partial list. how about below patch, only compile tested.
>
> In general I think it is good to move the call to discard_slab() out from
> under the list_lock in unfreeze_partials(). Could you fold
> discard_page_list into unfreeze_partials()? __flush_cpu_slab still calls
> discard_page_list with disabled interrupts even after your patch.
I'm afraid there is alloc-in-atomic() error, but Yong & Julie's test
shows this is over thinking. Here is the updated patch. Yong & Julie, I
added your report/test by, because the new patch should be just like the
old one, but since I changed it a little bit, can you please have a
quick check? Thanks!
Subject: slub: move discard_slab out of node lock
Lockdep reports there is potential deadlock for slub node list_lock.
discard_slab() is called with the lock hold in unfreeze_partials(),
which could trigger a slab allocation, which could hold the lock again.
discard_slab() doesn't need hold the lock actually, if the slab is
already removed from partial list.
Reported-and-tested-by: Yong Zhang <yong.zhang0@...il.com>
Reported-and-tested-by: Julie Sullivan <kernelmail.jms@...il.com>
Signed-off-by: Shaohua Li <shaohua.li@...el.com>
---
mm/slub.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2011-11-11 16:17:39.000000000 +0800
+++ linux/mm/slub.c 2011-11-14 13:11:11.000000000 +0800
@@ -1862,7 +1862,7 @@ static void unfreeze_partials(struct kme
{
struct kmem_cache_node *n = NULL;
struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
- struct page *page;
+ struct page *page, *discard_page = NULL;
while ((page = c->partial)) {
enum slab_modes { M_PARTIAL, M_FREE };
@@ -1916,14 +1916,22 @@ static void unfreeze_partials(struct kme
"unfreezing slab"));
if (m == M_FREE) {
- stat(s, DEACTIVATE_EMPTY);
- discard_slab(s, page);
- stat(s, FREE_SLAB);
+ page->next = discard_page;
+ discard_page = page;
}
}
if (n)
spin_unlock(&n->list_lock);
+
+ while (discard_page) {
+ page = discard_page;
+ discard_page = discard_page->next;
+
+ stat(s, DEACTIVATE_EMPTY);
+ discard_slab(s, page);
+ stat(s, FREE_SLAB);
+ }
}
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists