[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YTK1ejOcuNyAqJ3ibKKyfR21UYHRNa7jpWS_H71xpnTpg@mail.gmail.com>
Date: Sat, 28 Jan 2023 16:16:34 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: paulmck@...nel.org
Cc: linux-kernel@...r.kernel.org,
Frederic Weisbecker <frederic@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Boqun Feng <boqun.feng@...il.com>,
Josh Triplett <josh@...htriplett.org>,
Lai Jiangshan <jiangshanlai@...il.com>, rcu@...r.kernel.org,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v4] srcu: Clarify comments on memory barrier "E"
On Sat, Jan 28, 2023 at 1:24 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Sat, Jan 28, 2023 at 03:59:01AM +0000, Joel Fernandes (Google) wrote:
> > During a flip, we have a full memory barrier before srcu_idx is incremented.
> >
> > The idea is we intend to order the first phase scan's read of lock
> > counters with the flipping of the index.
> >
> > However, such ordering is already enforced because of the
> > control-dependency between the 2 scans. We would be flipping the index
> > only if lock and unlock counts matched.
> >
> > But such match will not happen if there was a pending reader before the flip
> > in the first place (observation courtesy Mathieu Desnoyers).
> >
> > The litmus test below shows this:
> > (test courtesy Frederic Weisbecker, Changes for ctrldep by Boqun/me):
>
> Much better, thank you!
>
> I of course did the usual wordsmithing, as shown below. Does this
> version capture your intent and understanding?
>
It looks good to me.
According to [1] , the architecture at least should not be reordering
read-write control dependency. Only read-read is problematic. But I am
not 100% sure, is that not true?
For the compiler, you are saying that read-write control dependency
can be reordered even with *ONCE() accesses? In other words, the
flipping of idx can happen in ->po order before the locks and unlock
counts match? That sounds sort of like a broken compiler.
[1] https://lpc.events/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf
More comments below:
> ------------------------------------------------------------------------
>
> commit 963f34624beb2af1ec08527e637d16ab6a1dacbd
> Author: Joel Fernandes (Google) <joel@...lfernandes.org>
> Date: Sat Jan 28 03:59:01 2023 +0000
>
> srcu: Clarify comments on memory barrier "E"
>
> There is an smp_mb() named "E" in srcu_flip() immediately before the
> increment (flip) of the srcu_struct structure's ->srcu_idx.
>
> The purpose of E is to order the preceding scan's read of lock counters
> against the flipping of the ->srcu_idx, in order to prevent new readers
> from continuing to use the old ->srcu_idx value, which might needlessly
> extend the grace period.
>
> However, this ordering is already enforced because of the control
> dependency between the preceding scan and the ->srcu_idx flip.
> This control dependency exists because atomic_long_read() is used
> to scan the counts, because WRITE_ONCE() is used to flip ->srcu_idx,
> and because ->srcu_idx is not flipped until the ->srcu_lock_count[] and
> ->srcu_unlock_count[] counts match. And such a match cannot happen when
> there is an in-flight reader that started before the flip (observation
> courtesy Mathieu Desnoyers).
Agreed.
> The litmus test below (courtesy of Frederic Weisbecker, with changes
> for ctrldep by Boqun and Joel) shows this:
>
> C srcu
> (*
> * bad condition: P0's first scan (SCAN1) saw P1's idx=0 LOCK count inc, though P1 saw flip.
> *
> * So basically, the ->po ordering on both P0 and P1 is enforced via ->ppo
> * (control deps) on both sides, and both P0 and P1 are interconnected by ->rf
> * relations. Combining the ->ppo with ->rf, a cycle is impossible.
> *)
>
> {}
>
> // updater
> P0(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
> {
> int lock1;
> int unlock1;
> int lock0;
> int unlock0;
>
> // SCAN1
> unlock1 = READ_ONCE(*UNLOCK1);
> smp_mb(); // A
> lock1 = READ_ONCE(*LOCK1);
>
> // FLIP
> if (lock1 == unlock1) { // Control dep
> smp_mb(); // E // Remove E and still passes.
> WRITE_ONCE(*IDX, 1);
> smp_mb(); // D
>
> // SCAN2
> unlock0 = READ_ONCE(*UNLOCK0);
> smp_mb(); // A
> lock0 = READ_ONCE(*LOCK0);
> }
> }
>
> // reader
> P1(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
> {
> int tmp;
> int idx1;
> int idx2;
>
> // 1st reader
> idx1 = READ_ONCE(*IDX);
> if (idx1 == 0) { // Control dep
> tmp = READ_ONCE(*LOCK0);
> WRITE_ONCE(*LOCK0, tmp + 1);
> smp_mb(); /* B and C */
> tmp = READ_ONCE(*UNLOCK0);
> WRITE_ONCE(*UNLOCK0, tmp + 1);
> } else {
> tmp = READ_ONCE(*LOCK1);
> WRITE_ONCE(*LOCK1, tmp + 1);
> smp_mb(); /* B and C */
> tmp = READ_ONCE(*UNLOCK1);
> WRITE_ONCE(*UNLOCK1, tmp + 1);
> }
> }
>
> exists (0:lock1=1 /\ 1:idx1=1)
>
> More complicated litmus tests with multiple SRCU readers also show that
> memory barrier E is not needed.
>
> This commit therefore clarifies the comment on memory barrier E.
>
> Why not also remove that redundant smp_mb()?
>
> Because control dependencies are quite fragile due to their not being
> recognized by most compilers and tools. Control dependencies therefore
> exact an ongoing maintenance burden, and such a burden cannot be justified
> in this slowpath. Therefore, that smp_mb() stays until such time as
> its overhead becomes a measurable problem in a real workload running on
> a real production system, or until such time as compilers start paying
> attention to this sort of control dependency.
>
> Co-developed-by: Frederic Weisbecker <frederic@...nel.org>
> Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Co-developed-by: Boqun Feng <boqun.feng@...il.com>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index c541b82646b63..cd46fe063e50f 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1085,16 +1085,36 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount)
> static void srcu_flip(struct srcu_struct *ssp)
> {
> /*
> - * Ensure that if this updater saw a given reader's increment
> - * from __srcu_read_lock(), that reader was using an old value
> - * of ->srcu_idx. Also ensure that if a given reader sees the
> - * new value of ->srcu_idx, this updater's earlier scans cannot
> - * have seen that reader's increments (which is OK, because this
> - * grace period need not wait on that reader).
> + * Because the flip of ->srcu_idx is executed only if the
> + * preceding call to srcu_readers_active_idx_check() found that
> + * the ->srcu_unlock_count[] and ->srcu_lock_count[] sums matched
> + * and because that summing uses atomic_long_read(), there is
> + * ordering due to a control dependency between that summing and
> + * the WRITE_ONCE() in this call to srcu_flip(). This ordering
> + * ensures that if this updater saw a given reader's increment from
> + * __srcu_read_lock(), that reader was using a value of ->srcu_idx
> + * from before the previous call to srcu_flip(), which should be
> + * quite rare. This ordering thus helps forward progress because
> + * the grace period could otherwise be delayed by additional
> + * calls to __srcu_read_lock() using that old (soon to be new)
> + * value of ->srcu_idx.
> + *
> + * This sum-equality check and ordering also ensures that if
> + * a given call to __srcu_read_lock() uses the new value of
> + * ->srcu_idx, this updater's earlier scans cannot have seen
> + * that reader's increments, which is all to the good, because
> + * this grace period need not wait on that reader. After all,
> + * if those earlier scans had seen that reader, there would have
> + * been a sum mismatch and this code would not be reached.
> + *
> + * This means that the following smp_mb() is redundant, but
> + * it stays until either (1) Compilers learn about this sort of
> + * control dependency or (2) Some production workload running on
> + * a production system is unduly delayed by this slowpath smp_mb().
> */
I agree that a read-write control dependency reordering by the
compiler can cause a reader to observe the flipped srcu_idx too soon,
thus perhaps delaying the grace period from ending (because the second
scan will now end up waiting on that reader..).
Thanks,
- Joel
> smp_mb(); /* E */ /* Pairs with B and C. */
>
> - WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> + WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); // Flip the counter.
>
> /*
> * Ensure that if the updater misses an __srcu_read_unlock()
Powered by blists - more mailing lists