[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230126200828.GK2948950@paulmck-ThinkPad-P17-Gen-1>
Date: Thu, 26 Jan 2023 12:08:28 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Alan Stern <stern@...land.harvard.edu>
Cc: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>,
parri.andrea@...il.com, will@...nel.org, peterz@...radead.org,
boqun.feng@...il.com, npiggin@...il.com, dhowells@...hat.com,
j.alglave@....ac.uk, luc.maranget@...ia.fr, akiyks@...il.com,
dlustig@...dia.com, joel@...lfernandes.org, urezki@...il.com,
quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to
po-unlock-lock-po
On Thu, Jan 26, 2023 at 11:36:51AM -0500, Alan Stern wrote:
> On Thu, Jan 26, 2023 at 02:46:03PM +0100, Jonas Oberhauser wrote:
> > LKMM uses two relations for talking about UNLOCK+LOCK pairings:
> >
> > 1) po-unlock-lock-po, which handles UNLOCK+LOCK pairings
> > on the same CPU or immediate lock handovers on the same
> > lock variable
> >
> > 2) po;[UL];(co|po);[LKW];po, which handles UNLOCK+LOCK pairs
> > literally as described in rcupdate.h#L1002, i.e., even
> > after a sequence of handovers on the same lock variable.
> >
> > The latter relation is used only once, to provide the guarantee
> > defined in rcupdate.h#L1002 by smp_mb__after_unlock_lock(), which
> > makes any UNLOCK+LOCK pair followed by the fence behave like a full
> > barrier.
> >
> > This patch drops this use in favor of using po-unlock-lock-po
> > everywhere, which unifies the way the model talks about UNLOCK+LOCK
> > pairings. At first glance this seems to weaken the guarantee given
> > by LKMM: When considering a long sequence of lock handovers
> > such as below, where P0 hands the lock to P1, which hands it to P2,
> > which finally executes such an after_unlock_lock fence, the mb
> > relation currently links any stores in the critical section of P0
> > to instructions P2 executes after its fence, but not so after the
> > patch.
> >
> > P0(int *x, int *y, spinlock_t *mylock)
> > {
> > spin_lock(mylock);
> > WRITE_ONCE(*x, 2);
> > spin_unlock(mylock);
> > WRITE_ONCE(*y, 1);
> > }
> >
> > P1(int *y, int *z, spinlock_t *mylock)
> > {
> > int r0 = READ_ONCE(*y); // reads 1
> > spin_lock(mylock);
> > spin_unlock(mylock);
> > WRITE_ONCE(*z,1);
> > }
> >
> > P2(int *z, int *d, spinlock_t *mylock)
> > {
> > int r1 = READ_ONCE(*z); // reads 1
> > spin_lock(mylock);
> > spin_unlock(mylock);
> > smp_mb__after_unlock_lock();
> > WRITE_ONCE(*d,1);
> > }
> >
> > P3(int *x, int *d)
> > {
> > WRITE_ONCE(*d,2);
> > smp_mb();
> > WRITE_ONCE(*x,1);
> > }
> >
> > exists (1:r0=1 /\ 2:r1=1 /\ x=2 /\ d=2)
> >
> > Nevertheless, the ordering guarantee given in rcupdate.h is actually
> > not weakened. This is because the unlock operations along the
> > sequence of handovers are A-cumulative fences. They ensure that any
> > stores that propagate to the CPU performing the first unlock
> > operation in the sequence must also propagate to every CPU that
> > performs a subsequent lock operation in the sequence. Therefore any
> > such stores will also be ordered correctly by the fence even if only
> > the final handover is considered a full barrier.
> >
> > Indeed this patch does not affect the behaviors allowed by LKMM at
> > all. The mb relation is used to define ordering through:
> > 1) mb/.../ppo/hb, where the ordering is subsumed by hb+ where the
> > lock-release, rfe, and unlock-acquire orderings each provide hb
> > 2) mb/strong-fence/cumul-fence/prop, where the rfe and A-cumulative
> > lock-release orderings simply add more fine-grained cumul-fence
> > edges to substitute a single strong-fence edge provided by a long
> > lock handover sequence
> > 3) mb/strong-fence/pb and various similar uses in the definition of
> > data races, where as discussed above any long handover sequence
> > can be turned into a sequence of cumul-fence edges that provide
> > the same ordering.
> >
> > Signed-off-by: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
> > ---
>
> Reviewed-by: Alan Stern <stern@...land.harvard.edu>
A quick spot check showed no change in performance, so thank you both!
Queued for review and further testing.
Thanx, Paul
> > tools/memory-model/linux-kernel.cat | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
> > index 07f884f9b2bf..6e531457bb73 100644
> > --- a/tools/memory-model/linux-kernel.cat
> > +++ b/tools/memory-model/linux-kernel.cat
> > @@ -37,8 +37,19 @@ let mb = ([M] ; fencerel(Mb) ; [M]) |
> > ([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
> > ([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
> > ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
> > - ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
> > - fencerel(After-unlock-lock) ; [M])
> > +(*
> > + * Note: The po-unlock-lock-po relation only passes the lock to the direct
> > + * successor, perhaps giving the impression that the ordering of the
> > + * smp_mb__after_unlock_lock() fence only affects a single lock handover.
> > + * However, in a longer sequence of lock handovers, the implicit
> > + * A-cumulative release fences of lock-release ensure that any stores that
> > + * propagate to one of the involved CPUs before it hands over the lock to
> > + * the next CPU will also propagate to the final CPU handing over the lock
> > + * to the CPU that executes the fence. Therefore, all those stores are
> > + * also affected by the fence.
> > + *)
> > + ([M] ; po-unlock-lock-po ;
> > + [After-unlock-lock] ; po ; [M])
> > let gp = po ; [Sync-rcu | Sync-srcu] ; po?
> > let strong-fence = mb | gp
> >
> > --
> > 2.17.1
> >
Powered by blists - more mailing lists