[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140619202928.GG4904@linux.vnet.ibm.com>
Date: Thu, 19 Jun 2014 13:29:28 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Christoph Lameter <cl@...two.org>,
Sasha Levin <sasha.levin@...cle.com>,
Pekka Enberg <penberg@...nel.org>,
Matt Mackall <mpm@...enic.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Jones <davej@...hat.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: slub/debugobjects: lockup when freeing memory
On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
>
> > On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> > > On Thu, 19 Jun 2014, Sasha Levin wrote:
> > >
> > > > [ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > > > [ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> > > > [ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> > > > [ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [ 690.770137] debug_object_init (lib/debugobjects.c:365)
> > > > [ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> > > > [ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> > > > [ 690.770137] ? discard_slab (mm/slub.c:1486)
> > > > [ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> > >
> > > __call_rcu does a slab allocation? This means __call_rcu can no longer be
> > > used in slab allocators? What happened?
> >
> > My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
> > call_rcu_bh(), or call_srcu().
> >
> > Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
> > That would be unfortunate...
>
> Well, no. Look at the callchain:
>
> __call_rcu
> debug_object_activate
> rcuhead_fixup_activate
> debug_object_init
> kmem_cache_alloc
>
> So call rcu activates the object, but the object has no reference in
> the debug objects code so the fixup code is called which inits the
> object and allocates a reference ....
OK, got it. And you are right, call_rcu() has done this for a very
long time, so not sure what changed. But it seems like the right
approach is to provide a debug-object-free call_rcu_alloc() for use
by the memory allocators.
Seem reasonable? If so, please see the following patch.
Thanx, Paul
------------------------------------------------------------------------
rcu: Provide call_rcu_alloc() and call_rcu_sched_alloc() to avoid recursion
The sl*b allocators use call_rcu() to manage object lifetimes, but
call_rcu() can use debug-objects, which in turn invokes the sl*b
allocators. These allocators are not prepared for this sort of
recursion, which can result in failures.
This commit therefore creates call_rcu_alloc() and call_rcu_sched_alloc(),
which act as their call_rcu() and call_rcu_sched() counterparts, but
which avoid invoking debug-objects. These new API members are intended
only for use by the sl*b allocators, and this commit makes the sl*b
allocators use call_rcu_alloc(). Why call_rcu_sched_alloc()? Because
in CONFIG_PREEMPT=n kernels, call_rcu() maps to call_rcu_sched(), so
therefore call_rcu_alloc() must map to call_rcu_sched_alloc().
Reported-by: Sasha Levin <sasha.levin@...cle.com>
Set-straight-by: Thomas Gleixner <tglx@...utronix.de>
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index d5e40a42cc43..1f708a7f9e7d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -140,13 +140,24 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
* if CPU A and CPU B are the same CPU (but again only if the system has
* more than one CPU).
*/
-void call_rcu(struct rcu_head *head,
- void (*func)(struct rcu_head *head));
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *head));
+
+/**
+ * call_rcu__alloc() - Queue an RCU for invocation after grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * Similar to call_rcu(), but avoids invoking debug-objects. This permits
+ * this to be called from allocators without needing to worry about
+ * recursive calls into those allocators for debug-objects allocations.
+ */
+void call_rcu_alloc(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
#else /* #ifdef CONFIG_PREEMPT_RCU */
/* In classic RCU, call_rcu() is just call_rcu_sched(). */
#define call_rcu call_rcu_sched
+#define call_rcu_alloc call_rcu_sched_alloc
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
@@ -196,6 +207,19 @@ void call_rcu_bh(struct rcu_head *head,
void call_rcu_sched(struct rcu_head *head,
void (*func)(struct rcu_head *rcu));
+/**
+ * call_rcu_sched_alloc() - Queue RCU for invocation after sched grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * Similar to call_rcu_sched(), but avoids invoking debug-objects.
+ * This permits this to be called from allocators without needing to
+ * worry about recursive calls into those allocators for debug-objects
+ * allocations.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu));
+
void synchronize_sched(void);
#ifdef CONFIG_PREEMPT_RCU
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index d9efcc13008c..515e60067c53 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -338,15 +338,14 @@ void synchronize_sched(void)
EXPORT_SYMBOL_GPL(synchronize_sched);
/*
- * Helper function for call_rcu() and call_rcu_bh().
+ * Provide call_rcu() function, but avoid invoking debug objects.
*/
-static void __call_rcu(struct rcu_head *head,
- void (*func)(struct rcu_head *rcu),
- struct rcu_ctrlblk *rcp)
+static void __call_rcu_nodo(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp)
{
unsigned long flags;
- debug_rcu_head_queue(head);
head->func = func;
head->next = NULL;
@@ -358,6 +357,17 @@ static void __call_rcu(struct rcu_head *head,
}
/*
+ * Helper function for call_rcu() and call_rcu_bh().
+ */
+static void __call_rcu(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp)
+{
+ debug_rcu_head_queue(head);
+ __call_rcu_nodo(head, func, rcp);
+}
+
+/*
* Post an RCU callback to be invoked after the end of an RCU-sched grace
* period. But since we have but one CPU, that would be after any
* quiescent state.
@@ -369,6 +379,18 @@ void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
EXPORT_SYMBOL_GPL(call_rcu_sched);
/*
+ * Similar to call_rcu_sched(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu_nodo(head, func, &rcu_sched_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu_sched_alloc);
+
+/*
* Post an RCU bottom-half callback to be invoked after any subsequent
* quiescent state.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8c47d04ecdea..593195d38850 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2640,25 +2640,16 @@ static void rcu_leak_callback(struct rcu_head *rhp)
}
/*
- * Helper function for call_rcu() and friends. The cpu argument will
- * normally be -1, indicating "currently running CPU". It may specify
- * a CPU only if that CPU is a no-CBs CPU. Currently, only _rcu_barrier()
- * is expected to specify a CPU.
+ * Provide call_rcu() function, but avoid invoking debug objects.
*/
static void
-__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
- struct rcu_state *rsp, int cpu, bool lazy)
+__call_rcu_nodo(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
+ struct rcu_state *rsp, int cpu, bool lazy)
{
unsigned long flags;
struct rcu_data *rdp;
WARN_ON_ONCE((unsigned long)head & 0x1); /* Misaligned rcu_head! */
- if (debug_rcu_head_queue(head)) {
- /* Probable double call_rcu(), so leak the callback. */
- ACCESS_ONCE(head->func) = rcu_leak_callback;
- WARN_ONCE(1, "__call_rcu(): Leaked duplicate callback\n");
- return;
- }
head->func = func;
head->next = NULL;
@@ -2704,6 +2695,25 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
}
/*
+ * Helper function for call_rcu() and friends. The cpu argument will
+ * normally be -1, indicating "currently running CPU". It may specify
+ * a CPU only if that CPU is a no-CBs CPU. Currently, only _rcu_barrier()
+ * is expected to specify a CPU.
+ */
+static void
+__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
+ struct rcu_state *rsp, int cpu, bool lazy)
+{
+ if (debug_rcu_head_queue(head)) {
+ /* Probable double call_rcu(), so leak the callback. */
+ ACCESS_ONCE(head->func) = rcu_leak_callback;
+ WARN_ONCE(1, "__call_rcu(): Leaked duplicate callback\n");
+ return;
+ }
+ __call_rcu_nodo(head, func, rsp, cpu, lazy);
+}
+
+/*
* Queue an RCU-sched callback for invocation after a grace period.
*/
void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
@@ -2713,6 +2723,18 @@ void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
EXPORT_SYMBOL_GPL(call_rcu_sched);
/*
+ * Similar to call_rcu_sched(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu_nodo(head, func, &rcu_sched_state, -1, 0);
+}
+EXPORT_SYMBOL_GPL(call_rcu_sched_alloc);
+
+/*
* Queue an RCU callback for invocation after a quicker grace period.
*/
void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 569b390daa15..e9362d7f8328 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -679,6 +679,17 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
}
EXPORT_SYMBOL_GPL(call_rcu);
+/*
+ * Similar to call_rcu(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_alloc(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu_nodo(head, func, &rcu_preempt_state, -1, 0);
+}
+EXPORT_SYMBOL_GPL(call_rcu_alloc);
+
/**
* synchronize_rcu - wait until a grace period has elapsed.
*
diff --git a/mm/slab.c b/mm/slab.c
index 9ca3b87edabc..1e5de0d39701 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1994,7 +1994,7 @@ static void slab_destroy(struct kmem_cache *cachep, struct page *page)
* we can use it safely.
*/
head = (void *)&page->rcu_head;
- call_rcu(head, kmem_rcu_free);
+ call_rcu_alloc(head, kmem_rcu_free);
} else {
kmem_freepages(cachep, page);
diff --git a/mm/slob.c b/mm/slob.c
index 21980e0f39a8..47ad4a43521a 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -605,7 +605,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
struct slob_rcu *slob_rcu;
slob_rcu = b + (c->size - sizeof(struct slob_rcu));
slob_rcu->size = c->size;
- call_rcu(&slob_rcu->head, kmem_rcu_free);
+ call_rcu_alloc(&slob_rcu->head, kmem_rcu_free);
} else {
__kmem_cache_free(b, c->size);
}
diff --git a/mm/slub.c b/mm/slub.c
index b2b047327d76..7f01e57fd99f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1512,7 +1512,7 @@ static void free_slab(struct kmem_cache *s, struct page *page)
head = (void *)&page->lru;
}
- call_rcu(head, rcu_free_slab);
+ call_rcu_alloc(head, rcu_free_slab);
} else
__free_slab(s, page);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists