[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1346350718-30937-18-git-send-email-paulmck@linux.vnet.ibm.com>
Date: Thu, 30 Aug 2012 11:18:33 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: linux-kernel@...r.kernel.org
Cc: mingo@...e.hu, laijs@...fujitsu.com, dipankar@...ibm.com,
akpm@...ux-foundation.org, mathieu.desnoyers@...ymtl.ca,
josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de,
peterz@...radead.org, rostedt@...dmis.org, Valdis.Kletnieks@...edu,
dhowells@...hat.com, eric.dumazet@...il.com, darren@...art.com,
fweisbec@...il.com, sbw@....edu, patches@...aro.org,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: [PATCH tip/core/rcu 18/23] rcu: Add random PROVE_RCU_DELAY to grace-period initialization
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
There are some additional potential grace-period initialization races
on systems with more than one rcu_node structure, for example:
1. CPU 0 completes a grace period, but needs an additional
grace period, so starts initializing one, initializing all
the non-leaf rcu_node strcutures and the first leaf rcu_node
structure. Because CPU 0 is both completing the old grace
period and starting a new one, it marks the completion of
the old grace period and the start of the new grace period
in a single traversal of the rcu_node structures.
Therefore, CPUs corresponding to the first rcu_node structure
can become aware that the prior grace period has ended, but
CPUs corresponding to the other rcu_node structures cannot
yet become aware of this.
2. CPU 1 passes through a quiescent state, and therefore informs
the RCU core. Because its leaf rcu_node structure has already
been initialized, so this CPU's quiescent state is applied to
the new (and only partially initialized) grace period.
3. CPU 1 enters an RCU read-side critical section and acquires
a reference to data item A. Note that this critical section
will not block the new grace period.
4. CPU 16 exits dyntick-idle mode. Because it was in dyntick-idle
mode, some other CPU informed the RCU core of its extended
quiescent state for the past several grace periods. This means
that CPU 16 is not yet aware that these grace periods have ended.
5. CPU 16 on the second leaf rcu_node structure removes data item A
from its enclosing data structure and passes it to call_rcu(),
which queues a callback in the RCU_NEXT_TAIL segment of the
callback queue.
6. CPU 16 enters the RCU core, possibly because it has taken a
scheduling-clock interrupt, or alternatively because it has
more than 10,000 callbacks queued. It notes that the second
most recent grace period has ended (recall that it cannot yet
become aware that the most recent grace period has completed),
and therefore advances its callbacks. The callback for data
item A is therefore in the RCU_NEXT_READY_TAIL segment of the
callback queue.
7. CPU 0 completes initialization of the remaining leaf rcu_node
structures for the new grace period, including the structure
corresponding to CPU 16.
8. CPU 16 again enters the RCU core, again, possibly because it has
taken a scheduling-clock interrupt, or alternatively because
it now has more than 10,000 callbacks queued. It notes that
the most recent grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_NEXT_TAIL segment of the callback queue.
9. All CPUs other than CPU 1 pass through quiescent states, so that
the new grace period completes. Note that CPU 1 is still in
its RCU read-side critical section, still referencing data item A.
10. Suppose that CPU 2 is the last CPU to pass through a quiescent
state for the new grace period, and suppose further that CPU 2
does not have any callbacks queued. It therefore traverses
all of the rcu_node structures, marking the new grace period
as completed, but does not initialize a new grace period.
11. CPU 16 yet again enters the RCU core, yet again possibly because
it has taken a scheduling-clock interrupt, or alternatively
because it now has more than 10,000 callbacks queued. It notes
that the new grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_DONE_TAIL segment of the callback queue. This means
that this callback is now considered ready to be invoked.
12. CPU 16 invokes the callback, freeing data item A while CPU 1
is still referencing it.
This sort of scenario represents a day-one bug for TREE_RCU, however,
the recent changes that permit RCU grace-period initialization to
be preempted made it much more probable. Still, it is sufficiently
improbable to make validation lengthy and inconvenient, so this commit
adds an anti-heisenbug to greatly increase the collision cross section,
also known as the probability of occurrence.
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
---
kernel/rcutree.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 4cfe488..1373388 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -52,6 +52,7 @@
#include <linux/prefetch.h>
#include <linux/delay.h>
#include <linux/stop_machine.h>
+#include <linux/random.h>
#include "rcutree.h"
#include <trace/events/rcu.h>
@@ -1105,6 +1106,10 @@ static int rcu_gp_init(struct rcu_state *rsp)
rnp->level, rnp->grplo,
rnp->grphi, rnp->qsmask);
raw_spin_unlock_irqrestore(&rnp->lock, flags);
+#ifdef CONFIG_PROVE_RCU_DELAY
+ if ((random32() % (rcu_num_nodes * 8)) == 0)
+ schedule_timeout_uninterruptible(2);
+#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
cond_resched();
}
--
1.7.8
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists