[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F4B383C.1000804@cn.fujitsu.com>
Date: Mon, 27 Feb 2012 16:01:00 +0800
From: Lai Jiangshan <laijs@...fujitsu.com>
To: paulmck@...ux.vnet.ibm.com
CC: linux-kernel@...r.kernel.org, mingo@...e.hu, dipankar@...ibm.com,
akpm@...ux-foundation.org, mathieu.desnoyers@...ymtl.ca,
josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de,
peterz@...radead.org, rostedt@...dmis.org, Valdis.Kletnieks@...edu,
dhowells@...hat.com, eric.dumazet@...il.com, darren@...art.com,
fweisbec@...il.com, patches@...aro.org
Subject: [PATCH 1/2 RFC] srcu: change the comments of the wait algorithm
Hi, ALL
The call_srcu() will be sent soon(may be in 2 days). I found something is not
good in current sruc when I implement it, so I do more prepare for it.
The second patch is inspired Peter. I had decided to use per-cpu machine,
the the snap array makes me unhappy. If a machine is sleeping/preempted
while checking, the other machine can't not check the same srcu_struct.
It is nothing big, but it also blocks the sychronize_srcu_expedited().
I hope sychronize_srcu_expedited() can't be blocked when it try to do its
fast-checking. So I try to find non-block checking algorithm, and I find
Peter's.
The most things in these two patches are comments, so I bring a lot
troubles to Paul because my poor English.
Thanks,
Lai
>From 77af819872ddab065d3a46758471b80f31b30e5e Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@...fujitsu.com>
Date: Mon, 27 Feb 2012 10:52:00 +0800
Subject: [PATCH 1/2] srcu: change the comments of the wait algorithm
The original comments does not describe the essential of the wait algorithm
well.
The safe of srcu-protected data and srcu critical section is provided by
wait_idx(), not the flipping.
The two index of the active counter array and the flipping are just used to keep
the wait_idx() from starvation.
(the flip also provides "only one srcu_read_lock() at most after flip
for every cpu", this coupling will be remove in future(next patch))
The code will be split as pieces between every machine-states for call_srcu(),
It is very hard to provide the exactly semantics as original comments,
So I have to consider the exactly what the algorithm, and I change this
comments.
The code is not changed, but it is refactored a little.
Signed-off-by: Lai Jiangshan <laijs@...fujitsu.com>
---
kernel/srcu.c | 75 +++++++++++++++++++++++++++++----------------------------
1 files changed, 38 insertions(+), 37 deletions(-)
diff --git a/kernel/srcu.c b/kernel/srcu.c
index b6b9ea2..47ee35d 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -249,6 +249,10 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
*/
#define SYNCHRONIZE_SRCU_READER_DELAY 5
+/*
+ * Wait until all the readers(which starts before this wait_idx()
+ * with the specified idx) complete.
+ */
static void wait_idx(struct srcu_struct *sp, int idx, bool expedited)
{
int trycount = 0;
@@ -291,24 +295,9 @@ static void wait_idx(struct srcu_struct *sp, int idx, bool expedited)
smp_mb(); /* E */
}
-/*
- * Flip the readers' index by incrementing ->completed, then wait
- * until there are no more readers using the counters referenced by
- * the old index value. (Recall that the index is the bottom bit
- * of ->completed.)
- *
- * Of course, it is possible that a reader might be delayed for the
- * full duration of flip_idx_and_wait() between fetching the
- * index and incrementing its counter. This possibility is handled
- * by the next __synchronize_srcu() invoking wait_idx() for such readers
- * before starting a new grace period.
- */
-static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited)
+static void srcu_flip(struct srcu_struct *sp)
{
- int idx;
-
- idx = sp->completed++ & 0x1;
- wait_idx(sp, idx, expedited);
+ sp->completed++;
}
/*
@@ -316,6 +305,8 @@ static void flip_idx_and_wait(struct srcu_struct *sp, bool expedited)
*/
static void __synchronize_srcu(struct srcu_struct *sp, bool expedited)
{
+ int busy_idx;
+
rcu_lockdep_assert(!lock_is_held(&sp->dep_map) &&
!lock_is_held(&rcu_bh_lock_map) &&
!lock_is_held(&rcu_lock_map) &&
@@ -323,8 +314,31 @@ static void __synchronize_srcu(struct srcu_struct *sp, bool expedited)
"Illegal synchronize_srcu() in same-type SRCU (or RCU) read-side critical section");
mutex_lock(&sp->mutex);
+ busy_idx = sp->completed & 0X1UL;
/*
+ * There are some readers start with idx=0, and some others start
+ * with idx=1. So two wait_idx()s are enough for synchronize:
+ * __synchronize_srcu() {
+ * wait_idx(sp, 0, expedited);
+ * wait_idx(sp, 1, expedited);
+ * }
+ * When it returns, all started readers have complete.
+ *
+ * But synchronizer may be starved by the readers, example,
+ * if sp->complete & 0x1L == 1, wait_idx(sp, 1, expedited)
+ * may not returns if there are continuous readers start
+ * with idx=1.
+ *
+ * So we need to flip the busy index to keep synchronizer
+ * from starvation.
+ */
+
+ /*
+ * The above comments assume we have readers with all the
+ * 2 idx. It does have this probability, some readers may
+ * hold the reader lock with idx=1-busy_idx:
+ *
* Suppose that during the previous grace period, a reader
* picked up the old value of the index, but did not increment
* its counter until after the previous instance of
@@ -333,31 +347,18 @@ static void __synchronize_srcu(struct srcu_struct *sp, bool expedited)
* not start until after the grace period started, so the grace
* period was not obligated to wait for that reader.
*
- * However, the current SRCU grace period does have to wait for
- * that reader. This is handled by invoking wait_idx() on the
- * non-active set of counters (hence sp->completed - 1). Once
- * wait_idx() returns, we know that all readers that picked up
- * the old value of ->completed and that already incremented their
- * counter will have completed.
- *
- * But what about readers that picked up the old value of
- * ->completed, but -still- have not managed to increment their
- * counter? We do not need to wait for those readers, because
- * they will have started their SRCU read-side critical section
- * after the current grace period starts.
- *
- * Because it is unlikely that readers will be preempted between
- * fetching ->completed and incrementing their counter, wait_idx()
+ * Because this probability is not high, wait_idx()
* will normally not need to wait.
*/
- wait_idx(sp, (sp->completed - 1) & 0x1, expedited);
+ wait_idx(sp, 1 - busy_idx, expedited);
+
+ /* flip the index to ensure the return of the next wait_idx() */
+ srcu_flip(sp);
/*
- * Now that wait_idx() has waited for the really old readers,
- * invoke flip_idx_and_wait() to flip the counter and wait
- * for current SRCU readers.
+ * Now that wait_idx() has waited for the really old readers.
*/
- flip_idx_and_wait(sp, expedited);
+ wait_idx(sp, busy_idx, expedited);
mutex_unlock(&sp->mutex);
}
--
1.7.4.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists