[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231017121905.1156166-3-frederic@kernel.org>
Date: Tue, 17 Oct 2023 14:19:04 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: LKML <linux-kernel@...r.kernel.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Uladzislau Rezki <urezki@...il.com>, rcu <rcu@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Frederic Weisbecker <frederic@...nel.org>
Subject: [PATCH 2/3] rcu-tasks: Pull sampling of ->percpu_dequeue_lim out of loop
From: "Paul E. McKenney" <paulmck@...nel.org>
The rcu_tasks_need_gpcb() samples ->percpu_dequeue_lim as part of the
condition clause of a "for" loop, which is a bit confusing. This commit
therefore hoists this sampling out of the loop, using the result loaded
in the condition clause.
So why does this work in the face of a concurrent switch from single-CPU
queueing to per-CPU queueing?
o The call_rcu_tasks_generic() that makes the change has already
enqueued its callback, which means that all of the other CPU's
callback queues are empty.
o For the call_rcu_tasks_generic() that first notices
the switch to per-CPU queues, the smp_store_release()
used to update ->percpu_enqueue_lim pairs with the
raw_spin_trylock_rcu_node()'s full barrier that is
between the READ_ONCE(rtp->percpu_enqueue_shift) and the
rcu_segcblist_enqueue() that enqueues the callback.
o Because this CPU's queue is empty (unless it happens to
be the original single queue, in which case there is no
need for synchronization), this call_rcu_tasks_generic()
will do an irq_work_queue() to schedule a handler for the
needed rcuwait_wake_up() call. This call will be ordered
after the first call_rcu_tasks_generic() function's change to
->percpu_dequeue_lim.
o This rcuwait_wake_up() will either happen before or after the
set_current_state() in rcuwait_wait_event(). If it happens
before, the "condition" argument's call to rcu_tasks_need_gpcb()
will be ordered after the original change, and all callbacks on
all CPUs will be visible. Otherwise, if it happens after, then
the grace-period kthread's state will be set back to running,
which will result in a later call to rcuwait_wait_event() and
thus to rcu_tasks_need_gpcb(), which will again see the change.
So it all works out.
Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
---
kernel/rcu/tasks.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 83049a893de5..94bb5abdbb37 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -432,6 +432,7 @@ static void rcu_barrier_tasks_generic(struct rcu_tasks *rtp)
static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
{
int cpu;
+ int dequeue_limit;
unsigned long flags;
bool gpdone = poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq);
long n;
@@ -439,7 +440,8 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
long ncbsnz = 0;
int needgpcb = 0;
- for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) {
+ dequeue_limit = smp_load_acquire(&rtp->percpu_dequeue_lim);
+ for (cpu = 0; cpu < dequeue_limit; cpu++) {
struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
/* Advance and accelerate any new callbacks. */
--
2.34.1
Powered by blists - more mailing lists