[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1492018825-25634-40-git-send-email-paulmck@linux.vnet.ibm.com>
Date: Wed, 12 Apr 2017 10:40:25 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: linux-kernel@...r.kernel.org
Cc: mingo@...nel.org, jiangshanlai@...il.com, dipankar@...ibm.com,
akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
josh@...htriplett.org, tglx@...utronix.de, peterz@...radead.org,
rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com,
fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: [PATCH tip/core/rcu 40/40] srcu: Parallelize callback handling
Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1],
however, there are workloads that could result in a high volume of
concurrent invocations of call_srcu(), which with current SRCU would
result in excessive lock contention on the srcu_struct structure's
->queue_lock, which protects SRCU's callback lists. This commit therefore
moves SRCU to per-CPU callback lists, thus greatly reducing contention.
Because a given SRCU instance no longer has a single centralized callback
list, starting grace periods and invoking callbacks each require a bit
more work. These are handled using an srcu_node tree that is in some ways
similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched
(for example, the srcu_node tree shape is controlled by exactly the
same Kconfig options and boot parameters that control the shape of the
rcu_node tree).
In addition, the old per-CPU srcu_array structure is now named srcu_data
and contains an rcu_segcblist structure named ->srcu_cblist for its
callbacks (and a spinlock to protect this). The srcu_struct gets
an srcu_gp_seq that is used to associate callback segments with the
corresponding completion-time grace-period number. These completion-time
grace-period numbers are propagated up the srcu_node tree so that the
grace-period workqueue handler can determine whether additional grace
periods are needed on the one hand and where to look for callbacks that
are ready to be invoked.
The srcu_barrier() function must now wait on all instances of the
per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected
by ->lock, srcu_barrier() can remotely add the needed callbacks.
In theory, it could also remotely start grace periods, but this gets
complex and racy. And interestingly enough, it is never necessary to
start a grace period in this case because srcu_barrier() only enqueues
a callback when a callback is already present. And a grace period has
to have already been started for this pre-existing callback. And it is
only the callback that srcu_barrier() needs to wait on, not any particular
grace period. Therefore, a new rcu_segcblist_entrain() function enqueues
the srcu_barrier() function's callback into the same segment occupied by
the pre-existing callback. The special case where all the pre-existing
callbacks are on a different list being invoked is handled by enqueuing
srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on
the done-callbacks check that takes place after all callbacks are inovked.
Note that the readers use the same algorithm as before. Note that there
is a separate srcu_idx that tells the readers what counter to increment.
This unfortunately cannot be combined with srcu_gp_seq because they
need to be incremented at different times.
This commit introduces some ugly #ifdefs in rcutorture. These will go
away when I feel good enough about Tree SRCU to ditch Classic SRCU.
[1] https://lwn.net/Articles/309030/
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
---
include/linux/rcu_segcblist.h | 42 ++-
include/linux/srcutree.h | 80 ++++--
kernel/rcu/rcutorture.c | 20 +-
kernel/rcu/srcutree.c | 639 +++++++++++++++++++++++++++++++++---------
kernel/rcu/tree.c | 6 +
kernel/rcu/tree.h | 8 +
6 files changed, 645 insertions(+), 150 deletions(-)
diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 74b1e7243955..ced8f313fd05 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -402,6 +402,37 @@ static inline void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
}
/*
+ * Entrain the specified callback onto the specified rcu_segcblist at
+ * the end of the last non-empty segment. If the entire rcu_segcblist
+ * is empty, make no change, but return false.
+ *
+ * This is intended for use by rcu_barrier()-like primitives, -not-
+ * for normal grace-period use. IMPORTANT: The callback you enqueue
+ * will wait for all prior callbacks, NOT necessarily for a grace
+ * period. You have been warned.
+ */
+static inline bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
+ struct rcu_head *rhp, bool lazy)
+{
+ int i;
+
+ if (rcu_segcblist_n_cbs(rsclp) == 0)
+ return false;
+ WRITE_ONCE(rsclp->len, rsclp->len + 1);
+ if (lazy)
+ rsclp->len_lazy++;
+ smp_mb(); /* Ensure counts are updated before callback is entrained. */
+ rhp->next = NULL;
+ for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
+ if (rsclp->tails[i] != rsclp->tails[i - 1])
+ break;
+ *rsclp->tails[i] = rhp;
+ for (; i <= RCU_NEXT_TAIL; i++)
+ rsclp->tails[i] = &rhp->next;
+ return true;
+}
+
+/*
* Extract only the counts from the specified rcu_segcblist structure,
* and place them in the specified rcu_cblist structure. This function
* supports both callback orphaning and invocation, hence the separation
@@ -537,7 +568,8 @@ static inline void rcu_segcblist_advance(struct rcu_segcblist *rsclp,
int i, j;
WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
- WARN_ON_ONCE(rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL));
+ if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
+ return;
/*
* Find all callbacks whose ->gp_seq numbers indicate that they
@@ -582,8 +614,9 @@ static inline void rcu_segcblist_advance(struct rcu_segcblist *rsclp,
* them to complete at the end of the earlier grace period.
*
* This function operates on an rcu_segcblist structure, and also the
- * grace-period sequence number at which new callbacks would become
- * ready to invoke.
+ * grace-period sequence number seq at which new callbacks would become
+ * ready to invoke. Returns true if there are callbacks that won't be
+ * ready to invoke until seq, false otherwise.
*/
static inline bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp,
unsigned long seq)
@@ -591,7 +624,8 @@ static inline bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp,
int i;
WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
- WARN_ON_ONCE(rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL));
+ if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
+ return false;
/*
* Find the segment preceding the oldest segment of callbacks
diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index f2b3bd6c6bc2..0400e211aa44 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -24,25 +24,75 @@
#ifndef _LINUX_SRCU_TREE_H
#define _LINUX_SRCU_TREE_H
-struct srcu_array {
- unsigned long lock_count[2];
- unsigned long unlock_count[2];
+#include <linux/rcu_node_tree.h>
+#include <linux/completion.h>
+
+struct srcu_node;
+struct srcu_struct;
+
+/*
+ * Per-CPU structure feeding into leaf srcu_node, similar in function
+ * to rcu_node.
+ */
+struct srcu_data {
+ /* Read-side state. */
+ unsigned long srcu_lock_count[2]; /* Locks per CPU. */
+ unsigned long srcu_unlock_count[2]; /* Unlocks per CPU. */
+
+ /* Update-side state. */
+ spinlock_t lock ____cacheline_internodealigned_in_smp;
+ struct rcu_segcblist srcu_cblist; /* List of callbacks.*/
+ unsigned long srcu_gp_seq_needed; /* Furthest future GP needed. */
+ bool srcu_cblist_invoking; /* Invoking these CBs? */
+ struct delayed_work work; /* Context for CB invoking. */
+ struct rcu_head srcu_barrier_head; /* For srcu_barrier() use. */
+ struct srcu_node *mynode; /* Leaf srcu_node. */
+ int cpu;
+ struct srcu_struct *sp;
};
+/*
+ * Node in SRCU combining tree, similar in function to rcu_data.
+ */
+struct srcu_node {
+ spinlock_t lock;
+ unsigned long srcu_have_cbs[4]; /* GP seq for children */
+ /* having CBs, but only */
+ /* is > ->srcu_gq_seq. */
+ struct srcu_node *srcu_parent; /* Next up in tree. */
+ int grplo; /* Least CPU for node. */
+ int grphi; /* Biggest CPU for node. */
+};
+
+/*
+ * Per-SRCU-domain structure, similar in function to rcu_state.
+ */
struct srcu_struct {
- unsigned long completed;
- unsigned long srcu_gp_seq;
- atomic_t srcu_exp_cnt;
- struct srcu_array __percpu *per_cpu_ref;
- spinlock_t queue_lock; /* protect ->srcu_cblist */
- struct rcu_segcblist srcu_cblist;
+ struct srcu_node node[NUM_RCU_NODES]; /* Combining tree. */
+ struct srcu_node *level[RCU_NUM_LVLS + 1];
+ /* First node at each level. */
+ struct mutex srcu_cb_mutex; /* Serialize CB preparation. */
+ spinlock_t gp_lock; /* protect ->srcu_cblist */
+ struct mutex srcu_gp_mutex; /* Serialize GP work. */
+ unsigned int srcu_idx; /* Current rdr array element. */
+ unsigned long srcu_gp_seq; /* Grace-period seq #. */
+ unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */
+ atomic_t srcu_exp_cnt; /* # ongoing expedited GPs. */
+ struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */
+ unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */
+ struct mutex srcu_barrier_mutex; /* Serialize barrier ops. */
+ struct completion srcu_barrier_completion;
+ /* Awaken barrier rq at end. */
+ atomic_t srcu_barrier_cpu_cnt; /* # CPUs not yet posting a */
+ /* callback for the barrier */
+ /* operation. */
struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
-/* Values for -> state variable. */
+/* Values for state variable (bottom bits of ->srcu_gp_seq). */
#define SRCU_STATE_IDLE 0
#define SRCU_STATE_SCAN1 1
#define SRCU_STATE_SCAN2 2
@@ -51,11 +101,9 @@ void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
- .completed = -300, \
- .per_cpu_ref = &name##_srcu_array, \
- .queue_lock = __SPIN_LOCK_UNLOCKED(name.queue_lock), \
- .srcu_cblist = RCU_SEGCBLIST_INITIALIZER(name.srcu_cblist),\
- .work = __DELAYED_WORK_INITIALIZER(name.work, process_srcu, 0),\
+ .sda = &name##_srcu_data, \
+ .gp_lock = __SPIN_LOCK_UNLOCKED(name.gp_lock), \
+ .srcu_gp_seq_needed = 0 - 1, \
__SRCU_DEP_MAP_INIT(name) \
}
@@ -79,7 +127,7 @@ void process_srcu(struct work_struct *work);
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
#define __DEFINE_SRCU(name, is_static) \
- static DEFINE_PER_CPU(struct srcu_array, name##_srcu_array);\
+ static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data);\
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 6f344b6748a8..e9d4527cdd43 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -563,17 +563,30 @@ static void srcu_torture_stats(void)
int idx;
#if defined(CONFIG_TREE_SRCU) || defined(CONFIG_CLASSIC_SRCU)
+#ifdef CONFIG_TREE_SRCU
+ idx = srcu_ctlp->srcu_idx & 0x1;
+#else /* #ifdef CONFIG_TREE_SRCU */
idx = srcu_ctlp->completed & 0x1;
+#endif /* #else #ifdef CONFIG_TREE_SRCU */
pr_alert("%s%s Tree SRCU per-CPU(idx=%d):",
torture_type, TORTURE_FLAG, idx);
for_each_possible_cpu(cpu) {
unsigned long l0, l1;
unsigned long u0, u1;
long c0, c1;
- struct srcu_array *counts = per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu);
+#ifdef CONFIG_TREE_SRCU
+ struct srcu_data *counts;
+ counts = per_cpu_ptr(srcu_ctlp->sda, cpu);
+ u0 = counts->srcu_unlock_count[!idx];
+ u1 = counts->srcu_unlock_count[idx];
+#else /* #ifdef CONFIG_TREE_SRCU */
+ struct srcu_array *counts;
+
+ counts = per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu);
u0 = counts->unlock_count[!idx];
u1 = counts->unlock_count[idx];
+#endif /* #else #ifdef CONFIG_TREE_SRCU */
/*
* Make sure that a lock is always counted if the corresponding
@@ -581,8 +594,13 @@ static void srcu_torture_stats(void)
*/
smp_rmb();
+#ifdef CONFIG_TREE_SRCU
+ l0 = counts->srcu_lock_count[!idx];
+ l1 = counts->srcu_lock_count[idx];
+#else /* #ifdef CONFIG_TREE_SRCU */
l0 = counts->lock_count[!idx];
l1 = counts->lock_count[idx];
+#endif /* #else #ifdef CONFIG_TREE_SRCU */
c0 = l0 - u0;
c1 = l1 - u1;
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index da676b0d016b..f9c684d79faa 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -36,19 +36,110 @@
#include <linux/delay.h>
#include <linux/srcu.h>
-#include <linux/rcu_node_tree.h>
#include "rcu.h"
-static int init_srcu_struct_fields(struct srcu_struct *sp)
+static void srcu_invoke_callbacks(struct work_struct *work);
+static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay);
+
+/*
+ * Initialize SRCU combining tree. Note that statically allocated
+ * srcu_struct structures might already have srcu_read_lock() and
+ * srcu_read_unlock() running against them. So if the is_static parameter
+ * is set, don't initialize ->srcu_lock_count[] and ->srcu_unlock_count[].
+ */
+static void init_srcu_struct_nodes(struct srcu_struct *sp, bool is_static)
{
- sp->completed = 0;
+ int cpu;
+ int i;
+ int level = 0;
+ int levelspread[RCU_NUM_LVLS];
+ struct srcu_data *sdp;
+ struct srcu_node *snp;
+ struct srcu_node *snp_first;
+
+ /* Work out the overall tree geometry. */
+ sp->level[0] = &sp->node[0];
+ for (i = 1; i < rcu_num_lvls; i++)
+ sp->level[i] = sp->level[i - 1] + num_rcu_lvl[i - 1];
+ rcu_init_levelspread(levelspread, num_rcu_lvl);
+
+ /* Each pass through this loop initializes one srcu_node structure. */
+ rcu_for_each_node_breadth_first(sp, snp) {
+ spin_lock_init(&snp->lock);
+ for (i = 0; i < ARRAY_SIZE(snp->srcu_have_cbs); i++)
+ snp->srcu_have_cbs[i] = 0;
+ snp->grplo = -1;
+ snp->grphi = -1;
+ if (snp == &sp->node[0]) {
+ /* Root node, special case. */
+ snp->srcu_parent = NULL;
+ continue;
+ }
+
+ /* Non-root node. */
+ if (snp == sp->level[level + 1])
+ level++;
+ snp->srcu_parent = sp->level[level - 1] +
+ (snp - sp->level[level]) /
+ levelspread[level - 1];
+ }
+
+ /*
+ * Initialize the per-CPU srcu_data array, which feeds into the
+ * leaves of the srcu_node tree.
+ */
+ WARN_ON_ONCE(ARRAY_SIZE(sdp->srcu_lock_count) !=
+ ARRAY_SIZE(sdp->srcu_unlock_count));
+ level = rcu_num_lvls - 1;
+ snp_first = sp->level[level];
+ for_each_possible_cpu(cpu) {
+ sdp = per_cpu_ptr(sp->sda, cpu);
+ spin_lock_init(&sdp->lock);
+ rcu_segcblist_init(&sdp->srcu_cblist);
+ sdp->srcu_cblist_invoking = false;
+ sdp->srcu_gp_seq_needed = sp->srcu_gp_seq;
+ sdp->mynode = &snp_first[cpu / levelspread[level]];
+ for (snp = sdp->mynode; snp != NULL; snp = snp->srcu_parent) {
+ if (snp->grplo < 0)
+ snp->grplo = cpu;
+ snp->grphi = cpu;
+ }
+ sdp->cpu = cpu;
+ INIT_DELAYED_WORK(&sdp->work, srcu_invoke_callbacks);
+ sdp->sp = sp;
+ if (is_static)
+ continue;
+
+ /* Dynamically allocated, better be no srcu_read_locks()! */
+ for (i = 0; i < ARRAY_SIZE(sdp->srcu_lock_count); i++) {
+ sdp->srcu_lock_count[i] = 0;
+ sdp->srcu_unlock_count[i] = 0;
+ }
+ }
+}
+
+/*
+ * Initialize non-compile-time initialized fields, including the
+ * associated srcu_node and srcu_data structures. The is_static
+ * parameter is passed through to init_srcu_struct_nodes(), and
+ * also tells us that ->sda has already been wired up to srcu_data.
+ */
+static int init_srcu_struct_fields(struct srcu_struct *sp, bool is_static)
+{
+ mutex_init(&sp->srcu_cb_mutex);
+ mutex_init(&sp->srcu_gp_mutex);
+ sp->srcu_idx = 0;
sp->srcu_gp_seq = 0;
atomic_set(&sp->srcu_exp_cnt, 0);
- spin_lock_init(&sp->queue_lock);
- rcu_segcblist_init(&sp->srcu_cblist);
+ sp->srcu_barrier_seq = 0;
+ mutex_init(&sp->srcu_barrier_mutex);
+ atomic_set(&sp->srcu_barrier_cpu_cnt, 0);
INIT_DELAYED_WORK(&sp->work, process_srcu);
- sp->per_cpu_ref = alloc_percpu(struct srcu_array);
- return sp->per_cpu_ref ? 0 : -ENOMEM;
+ if (!is_static)
+ sp->sda = alloc_percpu(struct srcu_data);
+ init_srcu_struct_nodes(sp, is_static);
+ smp_store_release(&sp->srcu_gp_seq_needed, 0); /* Init done. */
+ return sp->sda ? 0 : -ENOMEM;
}
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -59,7 +150,8 @@ int __init_srcu_struct(struct srcu_struct *sp, const char *name,
/* Don't re-initialize a lock while it is held. */
debug_check_no_locks_freed((void *)sp, sizeof(*sp));
lockdep_init_map(&sp->dep_map, name, key, 0);
- return init_srcu_struct_fields(sp);
+ spin_lock_init(&sp->gp_lock);
+ return init_srcu_struct_fields(sp, false);
}
EXPORT_SYMBOL_GPL(__init_srcu_struct);
@@ -75,15 +167,41 @@ EXPORT_SYMBOL_GPL(__init_srcu_struct);
*/
int init_srcu_struct(struct srcu_struct *sp)
{
- return init_srcu_struct_fields(sp);
+ spin_lock_init(&sp->gp_lock);
+ return init_srcu_struct_fields(sp, false);
}
EXPORT_SYMBOL_GPL(init_srcu_struct);
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/*
- * Returns approximate total of the readers' ->lock_count[] values for the
- * rank of per-CPU counters specified by idx.
+ * First-use initialization of statically allocated srcu_struct
+ * structure. Wiring up the combining tree is more than can be
+ * done with compile-time initialization, so this check is added
+ * to each update-side SRCU primitive. Use ->gp_lock, which -is-
+ * compile-time initialized, to resolve races involving multiple
+ * CPUs trying to garner first-use privileges.
+ */
+static void check_init_srcu_struct(struct srcu_struct *sp)
+{
+ unsigned long flags;
+
+ WARN_ON_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INIT);
+ /* The smp_load_acquire() pairs with the smp_store_release(). */
+ if (!rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq_needed))) /*^^^*/
+ return; /* Already initialized. */
+ spin_lock_irqsave(&sp->gp_lock, flags);
+ if (!rcu_seq_state(sp->srcu_gp_seq_needed)) {
+ spin_unlock_irqrestore(&sp->gp_lock, flags);
+ return;
+ }
+ init_srcu_struct_fields(sp, true);
+ spin_unlock_irqrestore(&sp->gp_lock, flags);
+}
+
+/*
+ * Returns approximate total of the readers' ->srcu_lock_count[] values
+ * for the rank of per-CPU counters specified by idx.
*/
static unsigned long srcu_readers_lock_idx(struct srcu_struct *sp, int idx)
{
@@ -91,16 +209,16 @@ static unsigned long srcu_readers_lock_idx(struct srcu_struct *sp, int idx)
unsigned long sum = 0;
for_each_possible_cpu(cpu) {
- struct srcu_array *cpuc = per_cpu_ptr(sp->per_cpu_ref, cpu);
+ struct srcu_data *cpuc = per_cpu_ptr(sp->sda, cpu);
- sum += READ_ONCE(cpuc->lock_count[idx]);
+ sum += READ_ONCE(cpuc->srcu_lock_count[idx]);
}
return sum;
}
/*
- * Returns approximate total of the readers' ->unlock_count[] values for the
- * rank of per-CPU counters specified by idx.
+ * Returns approximate total of the readers' ->srcu_unlock_count[] values
+ * for the rank of per-CPU counters specified by idx.
*/
static unsigned long srcu_readers_unlock_idx(struct srcu_struct *sp, int idx)
{
@@ -108,9 +226,9 @@ static unsigned long srcu_readers_unlock_idx(struct srcu_struct *sp, int idx)
unsigned long sum = 0;
for_each_possible_cpu(cpu) {
- struct srcu_array *cpuc = per_cpu_ptr(sp->per_cpu_ref, cpu);
+ struct srcu_data *cpuc = per_cpu_ptr(sp->sda, cpu);
- sum += READ_ONCE(cpuc->unlock_count[idx]);
+ sum += READ_ONCE(cpuc->srcu_unlock_count[idx]);
}
return sum;
}
@@ -145,14 +263,14 @@ static bool srcu_readers_active_idx_check(struct srcu_struct *sp, int idx)
* the current index but not have incremented the lock counter yet.
*
* Possible bug: There is no guarantee that there haven't been
- * ULONG_MAX increments of ->lock_count[] since the unlocks were
+ * ULONG_MAX increments of ->srcu_lock_count[] since the unlocks were
* counted, meaning that this could return true even if there are
* still active readers. Since there are no memory barriers around
- * srcu_flip(), the CPU is not required to increment ->completed
+ * srcu_flip(), the CPU is not required to increment ->srcu_idx
* before running srcu_readers_unlock_idx(), which means that there
* could be an arbitrarily large number of critical sections that
* execute after srcu_readers_unlock_idx() but use the old value
- * of ->completed.
+ * of ->srcu_idx.
*/
return srcu_readers_lock_idx(sp, idx) == unlocks;
}
@@ -172,12 +290,12 @@ static bool srcu_readers_active(struct srcu_struct *sp)
unsigned long sum = 0;
for_each_possible_cpu(cpu) {
- struct srcu_array *cpuc = per_cpu_ptr(sp->per_cpu_ref, cpu);
+ struct srcu_data *cpuc = per_cpu_ptr(sp->sda, cpu);
- sum += READ_ONCE(cpuc->lock_count[0]);
- sum += READ_ONCE(cpuc->lock_count[1]);
- sum -= READ_ONCE(cpuc->unlock_count[0]);
- sum -= READ_ONCE(cpuc->unlock_count[1]);
+ sum += READ_ONCE(cpuc->srcu_lock_count[0]);
+ sum += READ_ONCE(cpuc->srcu_lock_count[1]);
+ sum -= READ_ONCE(cpuc->srcu_unlock_count[0]);
+ sum -= READ_ONCE(cpuc->srcu_unlock_count[1]);
}
return sum;
}
@@ -193,18 +311,21 @@ static bool srcu_readers_active(struct srcu_struct *sp)
*/
void cleanup_srcu_struct(struct srcu_struct *sp)
{
+ int cpu;
+
WARN_ON_ONCE(atomic_read(&sp->srcu_exp_cnt));
if (WARN_ON(srcu_readers_active(sp)))
return; /* Leakage unless caller handles error. */
- if (WARN_ON(!rcu_segcblist_empty(&sp->srcu_cblist)))
- return; /* Leakage unless caller handles error. */
flush_delayed_work(&sp->work);
- if (WARN_ON(rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)) != SRCU_STATE_IDLE)) {
- pr_info("cleanup_srcu_struct: Active srcu_struct %lu CBs %c state: %d\n", rcu_segcblist_n_cbs(&sp->srcu_cblist), ".E"[rcu_segcblist_empty(&sp->srcu_cblist)], rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)));
+ for_each_possible_cpu(cpu)
+ flush_delayed_work(&per_cpu_ptr(sp->sda, cpu)->work);
+ if (WARN_ON(rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)) != SRCU_STATE_IDLE) ||
+ WARN_ON(srcu_readers_active(sp))) {
+ pr_info("cleanup_srcu_struct: Active srcu_struct %p state: %d\n", sp, rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)));
return; /* Caller forgot to stop doing call_srcu()? */
}
- free_percpu(sp->per_cpu_ref);
- sp->per_cpu_ref = NULL;
+ free_percpu(sp->sda);
+ sp->sda = NULL;
}
EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
@@ -217,8 +338,8 @@ int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;
- idx = READ_ONCE(sp->completed) & 0x1;
- __this_cpu_inc(sp->per_cpu_ref->lock_count[idx]);
+ idx = READ_ONCE(sp->srcu_idx) & 0x1;
+ __this_cpu_inc(sp->sda->srcu_lock_count[idx]);
smp_mb(); /* B */ /* Avoid leaking the critical section. */
return idx;
}
@@ -233,7 +354,7 @@ EXPORT_SYMBOL_GPL(__srcu_read_lock);
void __srcu_read_unlock(struct srcu_struct *sp, int idx)
{
smp_mb(); /* C */ /* Avoid leaking the critical section. */
- this_cpu_inc(sp->per_cpu_ref->unlock_count[idx]);
+ this_cpu_inc(sp->sda->srcu_unlock_count[idx]);
}
EXPORT_SYMBOL_GPL(__srcu_read_unlock);
@@ -251,19 +372,207 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
*/
static void srcu_gp_start(struct srcu_struct *sp)
{
+ struct srcu_data *sdp = this_cpu_ptr(sp->sda);
int state;
- rcu_segcblist_accelerate(&sp->srcu_cblist,
- rcu_seq_snap(&sp->srcu_gp_seq));
+ RCU_LOCKDEP_WARN(!lockdep_is_held(&sp->gp_lock),
+ "Invoked srcu_gp_start() without ->gp_lock!");
+ WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
+ rcu_segcblist_advance(&sdp->srcu_cblist,
+ rcu_seq_current(&sp->srcu_gp_seq));
+ (void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
+ rcu_seq_snap(&sp->srcu_gp_seq));
rcu_seq_start(&sp->srcu_gp_seq);
state = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
}
/*
+ * Track online CPUs to guide callback workqueue placement.
+ */
+DEFINE_PER_CPU(bool, srcu_online);
+
+void srcu_online_cpu(unsigned int cpu)
+{
+ WRITE_ONCE(per_cpu(srcu_online, cpu), true);
+}
+
+void srcu_offline_cpu(unsigned int cpu)
+{
+ WRITE_ONCE(per_cpu(srcu_online, cpu), false);
+}
+
+/*
+ * Place the workqueue handler on the specified CPU if online, otherwise
+ * just run it whereever. This is useful for placing workqueue handlers
+ * that are to invoke the specified CPU's callbacks.
+ */
+static bool srcu_queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+ struct delayed_work *dwork,
+ unsigned long delay)
+{
+ bool ret;
+
+ preempt_disable();
+ if (READ_ONCE(per_cpu(srcu_online, cpu)))
+ ret = queue_delayed_work_on(cpu, wq, dwork, delay);
+ else
+ ret = queue_delayed_work(wq, dwork, delay);
+ preempt_enable();
+ return ret;
+}
+
+/*
+ * Schedule callback invocation for the specified srcu_data structure,
+ * if possible, on the corresponding CPU.
+ */
+static void srcu_schedule_cbs_sdp(struct srcu_data *sdp, unsigned long delay)
+{
+ srcu_queue_delayed_work_on(sdp->cpu, system_power_efficient_wq,
+ &sdp->work, delay);
+}
+
+/*
+ * Schedule callback invocation for all srcu_data structures associated
+ * with the specified srcu_node structure, if possible, on the corresponding
+ * CPUs.
+ */
+static void srcu_schedule_cbs_snp(struct srcu_struct *sp, struct srcu_node *snp)
+{
+ int cpu;
+
+ for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
+ srcu_schedule_cbs_sdp(per_cpu_ptr(sp->sda, cpu), SRCU_INTERVAL);
+}
+
+/*
+ * Note the end of an SRCU grace period. Initiates callback invocation
+ * and starts a new grace period if needed.
+ *
+ * The ->srcu_cb_mutex acquisition does not protect any data, but
+ * instead prevents more than one grace period from starting while we
+ * are initiating callback invocation. This allows the ->srcu_have_cbs[]
+ * array to have a finite number of elements.
+ */
+static void srcu_gp_end(struct srcu_struct *sp)
+{
+ bool cbs;
+ unsigned long gpseq;
+ int idx;
+ int idxnext;
+ struct srcu_node *snp;
+
+ /* Prevent more than one additional grace period. */
+ mutex_lock(&sp->srcu_cb_mutex);
+
+ /* End the current grace period. */
+ spin_lock_irq(&sp->gp_lock);
+ idx = rcu_seq_state(sp->srcu_gp_seq);
+ WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
+ rcu_seq_end(&sp->srcu_gp_seq);
+ gpseq = rcu_seq_current(&sp->srcu_gp_seq);
+ spin_unlock_irq(&sp->gp_lock);
+ mutex_unlock(&sp->srcu_gp_mutex);
+ /* A new grace period can start at this point. But only one. */
+
+ /* Initiate callback invocation as needed. */
+ idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
+ idxnext = (idx + 1) % ARRAY_SIZE(snp->srcu_have_cbs);
+ rcu_for_each_node_breadth_first(sp, snp) {
+ spin_lock_irq(&snp->lock);
+ cbs = false;
+ if (snp >= sp->level[rcu_num_lvls - 1])
+ cbs = snp->srcu_have_cbs[idx] == gpseq;
+ snp->srcu_have_cbs[idx] = gpseq;
+ rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
+ spin_unlock_irq(&snp->lock);
+ if (cbs) {
+ smp_mb(); /* GP end before CB invocation. */
+ srcu_schedule_cbs_snp(sp, snp);
+ }
+ }
+
+ /* Callback initiation done, allow grace periods after next. */
+ mutex_unlock(&sp->srcu_cb_mutex);
+
+ /* Start a new grace period if needed. */
+ spin_lock_irq(&sp->gp_lock);
+ gpseq = rcu_seq_current(&sp->srcu_gp_seq);
+ if (!rcu_seq_state(gpseq) &&
+ ULONG_CMP_LT(gpseq, sp->srcu_gp_seq_needed)) {
+ srcu_gp_start(sp);
+ spin_unlock_irq(&sp->gp_lock);
+ /* Throttle expedited grace periods: Should be rare! */
+ srcu_reschedule(sp, atomic_read(&sp->srcu_exp_cnt) &&
+ rcu_seq_ctr(gpseq) & 0xf
+ ? 0
+ : SRCU_INTERVAL);
+ } else {
+ spin_unlock_irq(&sp->gp_lock);
+ }
+}
+
+/*
+ * Funnel-locking scheme to scalably mediate many concurrent grace-period
+ * requests. The winner has to do the work of actually starting grace
+ * period s. Losers must either ensure that their desired grace-period
+ * number is recorded on at least their leaf srcu_node structure, or they
+ * must take steps to invoke their own callbacks.
+ */
+static void srcu_funnel_gp_start(struct srcu_struct *sp,
+ struct srcu_data *sdp,
+ unsigned long s)
+{
+ unsigned long flags;
+ int idx = rcu_seq_ctr(s) % ARRAY_SIZE(sdp->mynode->srcu_have_cbs);
+ struct srcu_node *snp = sdp->mynode;
+ unsigned long snp_seq;
+
+ /* Each pass through the loop does one level of the srcu_node tree. */
+ for (; snp != NULL; snp = snp->srcu_parent) {
+ if (rcu_seq_done(&sp->srcu_gp_seq, s) && snp != sdp->mynode)
+ return; /* GP already done and CBs recorded. */
+ spin_lock_irqsave(&snp->lock, flags);
+ if (ULONG_CMP_GE(snp->srcu_have_cbs[idx], s)) {
+ snp_seq = snp->srcu_have_cbs[idx];
+ spin_unlock_irqrestore(&snp->lock, flags);
+ if (snp == sdp->mynode && snp_seq != s) {
+ smp_mb(); /* CBs after GP! */
+ srcu_schedule_cbs_sdp(sdp, 0);
+ }
+ return;
+ }
+ snp->srcu_have_cbs[idx] = s;
+ spin_unlock_irqrestore(&snp->lock, flags);
+ }
+
+ /* Top of tree, must ensure the grace period will be started. */
+ spin_lock_irqsave(&sp->gp_lock, flags);
+ if (ULONG_CMP_LT(sp->srcu_gp_seq_needed, s)) {
+ /*
+ * Record need for grace period s. Pair with load
+ * acquire setting up for initialization.
+ */
+ smp_store_release(&sp->srcu_gp_seq_needed, s); /*^^^*/
+ }
+
+ /* If grace period not already done and none in progress, start it. */
+ if (!rcu_seq_done(&sp->srcu_gp_seq, s) &&
+ rcu_seq_state(sp->srcu_gp_seq) == SRCU_STATE_IDLE) {
+ WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
+ srcu_gp_start(sp);
+ queue_delayed_work(system_power_efficient_wq, &sp->work,
+ atomic_read(&sp->srcu_exp_cnt)
+ ? 0
+ : SRCU_INTERVAL);
+ }
+ spin_unlock_irqrestore(&sp->gp_lock, flags);
+}
+
+/*
* Wait until all readers counted by array index idx complete, but
* loop an additional time if there is an expedited grace period pending.
- * The caller must ensure that ->completed is not changed while checking.
+ * The caller must ensure that ->srcu_idx is not changed while checking.
*/
static bool try_check_zero(struct srcu_struct *sp, int idx, int trycount)
{
@@ -277,13 +586,13 @@ static bool try_check_zero(struct srcu_struct *sp, int idx, int trycount)
}
/*
- * Increment the ->completed counter so that future SRCU readers will
- * use the other rank of the ->(un)lock_count[] arrays. This allows
+ * Increment the ->srcu_idx counter so that future SRCU readers will
+ * use the other rank of the ->srcu_(un)lock_count[] arrays. This allows
* us to wait for pre-existing readers in a starvation-free manner.
*/
static void srcu_flip(struct srcu_struct *sp)
{
- WRITE_ONCE(sp->completed, sp->completed + 1);
+ WRITE_ONCE(sp->srcu_idx, sp->srcu_idx + 1);
/*
* Ensure that if the updater misses an __srcu_read_unlock()
@@ -296,21 +605,9 @@ static void srcu_flip(struct srcu_struct *sp)
}
/*
- * End an SRCU grace period.
- */
-static void srcu_gp_end(struct srcu_struct *sp)
-{
- rcu_seq_end(&sp->srcu_gp_seq);
-
- spin_lock_irq(&sp->queue_lock);
- rcu_segcblist_advance(&sp->srcu_cblist,
- rcu_seq_current(&sp->srcu_gp_seq));
- spin_unlock_irq(&sp->queue_lock);
-}
-
-/*
- * Enqueue an SRCU callback on the specified srcu_struct structure,
- * initiating grace-period processing if it is not already running.
+ * Enqueue an SRCU callback on the srcu_data structure associated with
+ * the current CPU and the specified srcu_struct structure, initiating
+ * grace-period processing if it is not already running.
*
* Note that all CPUs must agree that the grace period extended beyond
* all pre-existing SRCU read-side critical section. On systems with
@@ -335,33 +632,40 @@ static void srcu_gp_end(struct srcu_struct *sp)
* srcu_read_lock(), and srcu_read_unlock() that are all passed the same
* srcu_struct structure.
*/
-void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
+void call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
rcu_callback_t func)
{
unsigned long flags;
-
- head->next = NULL;
- head->func = func;
- spin_lock_irqsave(&sp->queue_lock, flags);
- smp_mb__after_unlock_lock(); /* Caller's prior accesses before GP. */
- rcu_segcblist_enqueue(&sp->srcu_cblist, head, false);
- if (rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)) == SRCU_STATE_IDLE) {
- srcu_gp_start(sp);
- queue_delayed_work(system_power_efficient_wq, &sp->work, 0);
+ bool needgp = false;
+ unsigned long s;
+ struct srcu_data *sdp;
+
+ check_init_srcu_struct(sp);
+ rhp->func = func;
+ local_irq_save(flags);
+ sdp = this_cpu_ptr(sp->sda);
+ spin_lock(&sdp->lock);
+ rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false);
+ rcu_segcblist_advance(&sdp->srcu_cblist,
+ rcu_seq_current(&sp->srcu_gp_seq));
+ s = rcu_seq_snap(&sp->srcu_gp_seq);
+ (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
+ if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
+ sdp->srcu_gp_seq_needed = s;
+ needgp = true;
}
- spin_unlock_irqrestore(&sp->queue_lock, flags);
+ spin_unlock_irqrestore(&sdp->lock, flags);
+ if (needgp)
+ srcu_funnel_gp_start(sp, sdp, s);
}
EXPORT_SYMBOL_GPL(call_srcu);
-static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay);
-
/*
* Helper function for synchronize_srcu() and synchronize_srcu_expedited().
*/
static void __synchronize_srcu(struct srcu_struct *sp)
{
struct rcu_synchronize rcu;
- struct rcu_head *head = &rcu.head;
RCU_LOCKDEP_WARN(lock_is_held(&sp->dep_map) ||
lock_is_held(&rcu_bh_lock_map) ||
@@ -372,26 +676,12 @@ static void __synchronize_srcu(struct srcu_struct *sp)
if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
return;
might_sleep();
+ check_init_srcu_struct(sp);
init_completion(&rcu.completion);
-
- head->next = NULL;
- head->func = wakeme_after_rcu;
- spin_lock_irq(&sp->queue_lock);
- smp_mb__after_unlock_lock(); /* Caller's prior accesses before GP. */
- if (rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)) == SRCU_STATE_IDLE) {
- /* steal the processing owner */
- rcu_segcblist_enqueue(&sp->srcu_cblist, head, false);
- srcu_gp_start(sp);
- spin_unlock_irq(&sp->queue_lock);
- /* give the processing owner to work_struct */
- srcu_reschedule(sp, 0);
- } else {
- rcu_segcblist_enqueue(&sp->srcu_cblist, head, false);
- spin_unlock_irq(&sp->queue_lock);
- }
-
+ init_rcu_head_on_stack(&rcu.head);
+ call_srcu(sp, &rcu.head, wakeme_after_rcu);
wait_for_completion(&rcu.completion);
- smp_mb(); /* Caller's later accesses after GP. */
+ destroy_rcu_head_on_stack(&rcu.head);
}
/**
@@ -426,8 +716,8 @@ EXPORT_SYMBOL_GPL(synchronize_srcu_expedited);
*
* Wait for the count to drain to zero of both indexes. To avoid the
* possible starvation of synchronize_srcu(), it waits for the count of
- * the index=((->completed & 1) ^ 1) to drain to zero at first,
- * and then flip the completed and wait for the count of the other index.
+ * the index=((->srcu_idx & 1) ^ 1) to drain to zero at first,
+ * and then flip the srcu_idx and wait for the count of the other index.
*
* Can block; must be called from process context.
*
@@ -468,13 +758,69 @@ void synchronize_srcu(struct srcu_struct *sp)
}
EXPORT_SYMBOL_GPL(synchronize_srcu);
+/*
+ * Callback function for srcu_barrier() use.
+ */
+static void srcu_barrier_cb(struct rcu_head *rhp)
+{
+ struct srcu_data *sdp;
+ struct srcu_struct *sp;
+
+ sdp = container_of(rhp, struct srcu_data, srcu_barrier_head);
+ sp = sdp->sp;
+ if (atomic_dec_and_test(&sp->srcu_barrier_cpu_cnt))
+ complete(&sp->srcu_barrier_completion);
+}
+
/**
* srcu_barrier - Wait until all in-flight call_srcu() callbacks complete.
* @sp: srcu_struct on which to wait for in-flight callbacks.
*/
void srcu_barrier(struct srcu_struct *sp)
{
- synchronize_srcu(sp);
+ int cpu;
+ struct srcu_data *sdp;
+ unsigned long s = rcu_seq_snap(&sp->srcu_barrier_seq);
+
+ check_init_srcu_struct(sp);
+ mutex_lock(&sp->srcu_barrier_mutex);
+ if (rcu_seq_done(&sp->srcu_barrier_seq, s)) {
+ smp_mb(); /* Force ordering following return. */
+ mutex_unlock(&sp->srcu_barrier_mutex);
+ return; /* Someone else did our work for us. */
+ }
+ rcu_seq_start(&sp->srcu_barrier_seq);
+ init_completion(&sp->srcu_barrier_completion);
+
+ /* Initial count prevents reaching zero until all CBs are posted. */
+ atomic_set(&sp->srcu_barrier_cpu_cnt, 1);
+
+ /*
+ * Each pass through this loop enqueues a callback, but only
+ * on CPUs already having callbacks enqueued. Note that if
+ * a CPU already has callbacks enqueue, it must have already
+ * registered the need for a future grace period, so all we
+ * need do is enqueue a callback that will use the same
+ * grace period as the last callback already in the queue.
+ */
+ for_each_possible_cpu(cpu) {
+ sdp = per_cpu_ptr(sp->sda, cpu);
+ spin_lock_irq(&sdp->lock);
+ atomic_inc(&sp->srcu_barrier_cpu_cnt);
+ sdp->srcu_barrier_head.func = srcu_barrier_cb;
+ if (!rcu_segcblist_entrain(&sdp->srcu_cblist,
+ &sdp->srcu_barrier_head, 0))
+ atomic_dec(&sp->srcu_barrier_cpu_cnt);
+ spin_unlock_irq(&sdp->lock);
+ }
+
+ /* Remove the initial count, at which point reaching zero can happen. */
+ if (atomic_dec_and_test(&sp->srcu_barrier_cpu_cnt))
+ complete(&sp->srcu_barrier_completion);
+ wait_for_completion(&sp->srcu_barrier_completion);
+
+ rcu_seq_end(&sp->srcu_barrier_seq);
+ mutex_unlock(&sp->srcu_barrier_mutex);
}
EXPORT_SYMBOL_GPL(srcu_barrier);
@@ -487,21 +833,24 @@ EXPORT_SYMBOL_GPL(srcu_barrier);
*/
unsigned long srcu_batches_completed(struct srcu_struct *sp)
{
- return sp->completed;
+ return sp->srcu_idx;
}
EXPORT_SYMBOL_GPL(srcu_batches_completed);
/*
- * Core SRCU state machine. Advance callbacks from ->batch_check0 to
- * ->batch_check1 and then to ->batch_done as readers drain.
+ * Core SRCU state machine. Push state bits of ->srcu_gp_seq
+ * to SRCU_STATE_SCAN2, and invoke srcu_gp_end() when scan has
+ * completed in that state.
*/
-static void srcu_advance_batches(struct srcu_struct *sp)
+static void srcu_advance_state(struct srcu_struct *sp)
{
int idx;
+ mutex_lock(&sp->srcu_gp_mutex);
+
/*
* Because readers might be delayed for an extended period after
- * fetching ->completed for their index, at any point in time there
+ * fetching ->srcu_idx for their index, at any point in time there
* might well be readers using both idx=0 and idx=1. We therefore
* need to wait for readers to clear from both index values before
* invoking a callback.
@@ -511,23 +860,29 @@ static void srcu_advance_batches(struct srcu_struct *sp)
*/
idx = rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq)); /* ^^^ */
if (idx == SRCU_STATE_IDLE) {
- spin_lock_irq(&sp->queue_lock);
- if (rcu_segcblist_empty(&sp->srcu_cblist)) {
- spin_unlock_irq(&sp->queue_lock);
+ spin_lock_irq(&sp->gp_lock);
+ if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) {
+ WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq));
+ spin_unlock_irq(&sp->gp_lock);
+ mutex_unlock(&sp->srcu_gp_mutex);
return;
}
idx = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
if (idx == SRCU_STATE_IDLE)
srcu_gp_start(sp);
- spin_unlock_irq(&sp->queue_lock);
- if (idx != SRCU_STATE_IDLE)
+ spin_unlock_irq(&sp->gp_lock);
+ if (idx != SRCU_STATE_IDLE) {
+ mutex_unlock(&sp->srcu_gp_mutex);
return; /* Someone else started the grace period. */
+ }
}
if (rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)) == SRCU_STATE_SCAN1) {
- idx = 1 ^ (sp->completed & 1);
- if (!try_check_zero(sp, idx, 1))
+ idx = 1 ^ (sp->srcu_idx & 1);
+ if (!try_check_zero(sp, idx, 1)) {
+ mutex_unlock(&sp->srcu_gp_mutex);
return; /* readers present, retry later. */
+ }
srcu_flip(sp);
rcu_seq_set_state(&sp->srcu_gp_seq, SRCU_STATE_SCAN2);
}
@@ -538,10 +893,12 @@ static void srcu_advance_batches(struct srcu_struct *sp)
* SRCU read-side critical sections are normally short,
* so check at least twice in quick succession after a flip.
*/
- idx = 1 ^ (sp->completed & 1);
- if (!try_check_zero(sp, idx, 2))
- return; /* readers present, retry after later. */
- srcu_gp_end(sp);
+ idx = 1 ^ (sp->srcu_idx & 1);
+ if (!try_check_zero(sp, idx, 2)) {
+ mutex_unlock(&sp->srcu_gp_mutex);
+ return; /* readers present, retry later. */
+ }
+ srcu_gp_end(sp); /* Releases ->srcu_gp_mutex. */
}
}
@@ -551,28 +908,51 @@ static void srcu_advance_batches(struct srcu_struct *sp)
* the workqueue. Note that needed memory barriers have been executed
* in this task's context by srcu_readers_active_idx_check().
*/
-static void srcu_invoke_callbacks(struct srcu_struct *sp)
+static void srcu_invoke_callbacks(struct work_struct *work)
{
+ bool more;
struct rcu_cblist ready_cbs;
struct rcu_head *rhp;
+ struct srcu_data *sdp;
+ struct srcu_struct *sp;
- spin_lock_irq(&sp->queue_lock);
- if (!rcu_segcblist_ready_cbs(&sp->srcu_cblist)) {
- spin_unlock_irq(&sp->queue_lock);
- return;
- }
+ sdp = container_of(work, struct srcu_data, work.work);
+ sp = sdp->sp;
rcu_cblist_init(&ready_cbs);
- rcu_segcblist_extract_done_cbs(&sp->srcu_cblist, &ready_cbs);
- spin_unlock_irq(&sp->queue_lock);
+ spin_lock_irq(&sdp->lock);
+ smp_mb(); /* Old grace periods before callback invocation! */
+ rcu_segcblist_advance(&sdp->srcu_cblist,
+ rcu_seq_current(&sp->srcu_gp_seq));
+ if (sdp->srcu_cblist_invoking ||
+ !rcu_segcblist_ready_cbs(&sdp->srcu_cblist)) {
+ spin_unlock_irq(&sdp->lock);
+ return; /* Someone else on the job or nothing to do. */
+ }
+
+ /* We are on the job! Extract and invoke ready callbacks. */
+ sdp->srcu_cblist_invoking = true;
+ rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs);
+ spin_unlock_irq(&sdp->lock);
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
}
- spin_lock_irq(&sp->queue_lock);
- rcu_segcblist_insert_count(&sp->srcu_cblist, &ready_cbs);
- spin_unlock_irq(&sp->queue_lock);
+
+ /*
+ * Update counts, accelerate new callbacks, and if needed,
+ * schedule another round of callback invocation.
+ */
+ spin_lock_irq(&sdp->lock);
+ rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
+ (void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
+ rcu_seq_snap(&sp->srcu_gp_seq));
+ sdp->srcu_cblist_invoking = false;
+ more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);
+ spin_unlock_irq(&sdp->lock);
+ if (more)
+ srcu_schedule_cbs_sdp(sdp, 0);
}
/*
@@ -581,19 +961,21 @@ static void srcu_invoke_callbacks(struct srcu_struct *sp)
*/
static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay)
{
- bool pending = true;
- int state;
+ bool pushgp = true;
- if (rcu_segcblist_empty(&sp->srcu_cblist)) {
- spin_lock_irq(&sp->queue_lock);
- state = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
- if (rcu_segcblist_empty(&sp->srcu_cblist) &&
- state == SRCU_STATE_IDLE)
- pending = false;
- spin_unlock_irq(&sp->queue_lock);
+ spin_lock_irq(&sp->gp_lock);
+ if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) {
+ if (!WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq))) {
+ /* All requests fulfilled, time to go idle. */
+ pushgp = false;
+ }
+ } else if (!rcu_seq_state(sp->srcu_gp_seq)) {
+ /* Outstanding request and no GP. Start one. */
+ srcu_gp_start(sp);
}
+ spin_unlock_irq(&sp->gp_lock);
- if (pending)
+ if (pushgp)
queue_delayed_work(system_power_efficient_wq, &sp->work, delay);
}
@@ -606,8 +988,7 @@ void process_srcu(struct work_struct *work)
sp = container_of(work, struct srcu_struct, work.work);
- srcu_advance_batches(sp);
- srcu_invoke_callbacks(sp);
+ srcu_advance_state(sp);
srcu_reschedule(sp, atomic_read(&sp->srcu_exp_cnt) ? 0 : SRCU_INTERVAL);
}
EXPORT_SYMBOL_GPL(process_srcu);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e0eb798181bc..75ca946734e6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3783,12 +3783,16 @@ int rcutree_online_cpu(unsigned int cpu)
{
sync_sched_exp_online_cleanup(cpu);
rcutree_affinity_setting(cpu, -1);
+ if (IS_ENABLED(CONFIG_TREE_SRCU))
+ srcu_online_cpu(cpu);
return 0;
}
int rcutree_offline_cpu(unsigned int cpu)
{
rcutree_affinity_setting(cpu, cpu);
+ if (IS_ENABLED(CONFIG_TREE_SRCU))
+ srcu_offline_cpu(cpu);
return 0;
}
@@ -4164,6 +4168,8 @@ void __init rcu_init(void)
for_each_online_cpu(cpu) {
rcutree_prepare_cpu(cpu);
rcu_cpu_starting(cpu);
+ if (IS_ENABLED(CONFIG_TREE_SRCU))
+ srcu_online_cpu(cpu);
}
}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index a2a45cb629d6..0e598ab08fea 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -541,6 +541,14 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
static void rcu_dynticks_task_enter(void);
static void rcu_dynticks_task_exit(void);
+#ifdef CONFIG_SRCU
+void srcu_online_cpu(unsigned int cpu);
+void srcu_offline_cpu(unsigned int cpu);
+#else /* #ifdef CONFIG_SRCU */
+void srcu_online_cpu(unsigned int cpu) { }
+void srcu_offline_cpu(unsigned int cpu) { }
+#endif /* #else #ifdef CONFIG_SRCU */
+
#endif /* #ifndef RCU_TREE_NONCORE */
#ifdef CONFIG_RCU_TRACE
--
2.5.2
Powered by blists - more mailing lists