[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221218232628.GT4001@paulmck-ThinkPad-P17-Gen-1>
Date: Sun, 18 Dec 2022 15:26:28 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
linux-kernel@...r.kernel.org,
Josh Triplett <josh@...htriplett.org>,
Lai Jiangshan <jiangshanlai@...il.com>, rcu@...r.kernel.org,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier
On Sun, Dec 18, 2022 at 04:30:33PM -0500, Joel Fernandes wrote:
> Hi Mathieu,
>
> On Sun, Dec 18, 2022 at 3:56 PM Mathieu Desnoyers
> <mathieu.desnoyers@...icios.com> wrote:
> >
> > On 2022-12-18 14:13, Joel Fernandes (Google) wrote:
> > > Hello, I believe the pre-flip memory barrier is not required. The only reason I
> > > can say to remove it, other than the possibility that it is unnecessary, is to
> > > not have extra code that does not help. However, since we are issuing a fully
> > > memory-barrier after the flip, I cannot say that it hurts to do it anyway.
> > >
> > > For this reason, please consider these patches as "informational", than a
> > > "please merge". :-) Though, feel free to consider merging if you agree!
> > >
> > > All SRCU scenarios pass with these, with 6 hours of testing.
> >
> > Hi Joel,
> >
> > Please have a look at the comments in my side-rcu implementation [1, 2].
> > It is similar to what SRCU does (per-cpu counter based grace period
> > tracking), but implemented for userspace. The comments explain why this
> > works without the memory barrier you identify as useless in SRCU.
> >
> > Following my implementation of side-rcu, I reviewed the SRCU comments
> > and identified that the barrier "/* E */" appears to be useless. I even
> > discussed this privately with Paul E. McKenney.
> >
> > My implementation and comments go further though, and skip the period
> > "flip" entirely if the first pass observes that all readers (in both
> > periods) are quiescent.
>
> Actually in SRCU, the first pass scans only 1 index, then does the
> flip, and the second pass scans the second index. Without doing a
> flip, an index cannot be scanned for forward progress reasons because
> it is still "active". So I am curious how you can skip flip and still
> scan both indexes? I will dig more into your implementation to learn more.
>
> > The most relevant comment in side-rcu is:
> >
> > * The grace period completes when it observes that there are no active
> > * readers within each of the periods.
> > *
> > * The active_readers state is initially true for each period, until the
> > * grace period observes that no readers are present for each given
> > * period, at which point the active_readers state becomes false.
> >
> > So I agree with the clarifications you propose here, but I think we can
> > improve the grace period implementation further by clarifying the SRCU
> > grace period model.
>
> Thanks a lot, I am curious how you do the "detection of no new
> readers" part without globally doing some kind of synchronization. I
> will dig more into your implementation to learn more.
It is very good to see the interest in SRCU internals!
Just out of an abundance of caution, I restate the requirements from
the synchronize_srcu() header comment:
* There are memory-ordering constraints implied by synchronize_srcu().
* On systems with more than one CPU, when synchronize_srcu() returns,
* each CPU is guaranteed to have executed a full memory barrier since
* the end of its last corresponding SRCU read-side critical section
* whose beginning preceded the call to synchronize_srcu(). In addition,
* each CPU having an SRCU read-side critical section that extends beyond
* the return from synchronize_srcu() is guaranteed to have executed a
* full memory barrier after the beginning of synchronize_srcu() and before
* the beginning of that SRCU read-side critical section. Note that these
* guarantees include CPUs that are offline, idle, or executing in user mode,
* as well as CPUs that are executing in the kernel.
*
* Furthermore, if CPU A invoked synchronize_srcu(), which returned
* to its caller on CPU B, then both CPU A and CPU B are guaranteed
* to have executed a full memory barrier during the execution of
* synchronize_srcu(). This guarantee applies even if CPU A and CPU B
* are the same CPU, but again only if the system has more than one CPU.
*
* Of course, these memory-ordering guarantees apply only when
* synchronize_srcu(), srcu_read_lock(), and srcu_read_unlock() are
* passed the same srcu_struct structure.
And from the __call_srcu() header comment:
* Note that all CPUs must agree that the grace period extended beyond
* all pre-existing SRCU read-side critical section. On systems with
* more than one CPU, this means that when "func()" is invoked, each CPU
* is guaranteed to have executed a full memory barrier since the end of
* its last corresponding SRCU read-side critical section whose beginning
* preceded the call to call_srcu(). It also means that each CPU executing
* an SRCU read-side critical section that continues beyond the start of
* "func()" must have executed a memory barrier after the call_srcu()
* but before the beginning of that SRCU read-side critical section.
* Note that these guarantees include CPUs that are offline, idle, or
* executing in user mode, as well as CPUs that are executing in the kernel.
*
* Furthermore, if CPU A invoked call_srcu() and CPU B invoked the
* resulting SRCU callback function "func()", then both CPU A and CPU
* B are guaranteed to execute a full memory barrier during the time
* interval between the call to call_srcu() and the invocation of "func()".
* This guarantee applies even if CPU A and CPU B are the same CPU (but
* again only if the system has more than one CPU).
*
* Of course, these guarantees apply only for invocations of call_srcu(),
* srcu_read_lock(), and srcu_read_unlock() that are all passed the same
* srcu_struct structure.
Thanx, Paul
Powered by blists - more mailing lists