[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef82e396-dad0-78fe-1cc7-817163829a77@huaweicloud.com>
Date: Fri, 27 Jan 2023 16:57:43 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: paulmck@...nel.org
Cc: Alan Stern <stern@...land.harvard.edu>, parri.andrea@...il.com,
will@...nel.org, peterz@...radead.org, boqun.feng@...il.com,
npiggin@...il.com, dhowells@...hat.com, j.alglave@....ac.uk,
luc.maranget@...ia.fr, akiyks@...il.com, dlustig@...dia.com,
joel@...lfernandes.org, urezki@...il.com, quic_neeraju@...cinc.com,
frederic@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to
po-unlock-lock-po
On 1/27/2023 4:13 PM, Paul E. McKenney wrote:
> On Fri, Jan 27, 2023 at 02:18:41PM +0100, Jonas Oberhauser wrote:
>> On 1/27/2023 12:21 AM, Paul E. McKenney wrote:
>>> On Thu, Jan 26, 2023 at 12:08:28PM -0800, Paul E. McKenney wrote:
>>>> On Thu, Jan 26, 2023 at 11:36:51AM -0500, Alan Stern wrote:
>>>>> On Thu, Jan 26, 2023 at 02:46:03PM +0100, Jonas Oberhauser wrote:
>>>>>> LKMM uses two relations for talking about UNLOCK+LOCK pairings:
>>>>>>
>>>>>> 1) po-unlock-lock-po, which handles UNLOCK+LOCK pairings
>>>>>> on the same CPU or immediate lock handovers on the same
>>>>>> lock variable
>>>>>>
>>>>>> 2) po;[UL];(co|po);[LKW];po, which handles UNLOCK+LOCK pairs
>>>>>> literally as described in rcupdate.h#L1002, i.e., even
>>>>>> after a sequence of handovers on the same lock variable.
>>>>>>
>>>>>> The latter relation is used only once, to provide the guarantee
>>>>>> defined in rcupdate.h#L1002 by smp_mb__after_unlock_lock(), which
>>>>>> makes any UNLOCK+LOCK pair followed by the fence behave like a full
>>>>>> barrier.
>>>>>>
>>>>>> This patch drops this use in favor of using po-unlock-lock-po
>>>>>> everywhere, which unifies the way the model talks about UNLOCK+LOCK
>>>>>> pairings. At first glance this seems to weaken the guarantee given
>>>>>> by LKMM: When considering a long sequence of lock handovers
>>>>>> such as below, where P0 hands the lock to P1, which hands it to P2,
>>>>>> which finally executes such an after_unlock_lock fence, the mb
>>>>>> relation currently links any stores in the critical section of P0
>>>>>> to instructions P2 executes after its fence, but not so after the
>>>>>> patch.
>>>>>>
>>>>>> P0(int *x, int *y, spinlock_t *mylock)
>>>>>> {
>>>>>> spin_lock(mylock);
>>>>>> WRITE_ONCE(*x, 2);
>>>>>> spin_unlock(mylock);
>>>>>> WRITE_ONCE(*y, 1);
>>>>>> }
>>>>>>
>>>>>> P1(int *y, int *z, spinlock_t *mylock)
>>>>>> {
>>>>>> int r0 = READ_ONCE(*y); // reads 1
>>>>>> spin_lock(mylock);
>>>>>> spin_unlock(mylock);
>>>>>> WRITE_ONCE(*z,1);
>>>>>> }
>>>>>>
>>>>>> P2(int *z, int *d, spinlock_t *mylock)
>>>>>> {
>>>>>> int r1 = READ_ONCE(*z); // reads 1
>>>>>> spin_lock(mylock);
>>>>>> spin_unlock(mylock);
>>>>>> smp_mb__after_unlock_lock();
>>>>>> WRITE_ONCE(*d,1);
>>>>>> }
>>>>>>
>>>>>> P3(int *x, int *d)
>>>>>> {
>>>>>> WRITE_ONCE(*d,2);
>>>>>> smp_mb();
>>>>>> WRITE_ONCE(*x,1);
>>>>>> }
>>>>>>
>>>>>> exists (1:r0=1 /\ 2:r1=1 /\ x=2 /\ d=2)
>>>>>>
>>>>>> Nevertheless, the ordering guarantee given in rcupdate.h is actually
>>>>>> not weakened. This is because the unlock operations along the
>>>>>> sequence of handovers are A-cumulative fences. They ensure that any
>>>>>> stores that propagate to the CPU performing the first unlock
>>>>>> operation in the sequence must also propagate to every CPU that
>>>>>> performs a subsequent lock operation in the sequence. Therefore any
>>>>>> such stores will also be ordered correctly by the fence even if only
>>>>>> the final handover is considered a full barrier.
>>>>>>
>>>>>> Indeed this patch does not affect the behaviors allowed by LKMM at
>>>>>> all. The mb relation is used to define ordering through:
>>>>>> 1) mb/.../ppo/hb, where the ordering is subsumed by hb+ where the
>>>>>> lock-release, rfe, and unlock-acquire orderings each provide hb
>>>>>> 2) mb/strong-fence/cumul-fence/prop, where the rfe and A-cumulative
>>>>>> lock-release orderings simply add more fine-grained cumul-fence
>>>>>> edges to substitute a single strong-fence edge provided by a long
>>>>>> lock handover sequence
>>>>>> 3) mb/strong-fence/pb and various similar uses in the definition of
>>>>>> data races, where as discussed above any long handover sequence
>>>>>> can be turned into a sequence of cumul-fence edges that provide
>>>>>> the same ordering.
>>>>>>
>>>>>> Signed-off-by: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
>>>>>> ---
>>>>> Reviewed-by: Alan Stern <stern@...land.harvard.edu>
>>>> A quick spot check showed no change in performance, so thank you both!
>>>>
>>>> Queued for review and further testing.
>>> And testing on https://github.com/paulmckrcu/litmus for litmus tests up
>>> to ten processes and allowing 10 minutes per litmus test got this:
>>>
>>> Exact output matches: 5208
>>> !!! Timed out: 38
>>> !!! Unknown primitive: 7
>>>
>>> This test compared output with and without your patch.
>>>
>>> For the tests with a Results clause, these failed:
>> Gave me a heart attack there for a second!
> Sorry for the scare!!!
>
>>> Also, I am going to be pushing the scripts I use to mainline. They might
>>> not be perfect, but they will be quite useful for this sort of change
>>> to the memory model.
>> I could also provide Coq proofs, although those are ignoring the srcu/data
>> race parts at the moment.
> Can such proofs serve as regression tests for future changes?
>
> Thanx, Paul
So-so. On the upside, it would be easy to make them raise an alarm if
the future change breaks stuff.
On the downside, they will often need maintenance together with any
change. Sometimes a lot, sometimes very little.
I think for the proofs that show the equivalence between two models, the
maintenance is quite a bit higher because every change needs to be
reflected in both versions. So if you do 10 equivalent transformations
and want to show that they remain equivalent with any future changes you
do, you need to keep at least 10 additional models around ("current LKMM
where ppo isn't in po, current LKMM where the unlock fence still relies
on co, ...").
Right now, each equivalence proof I did (e.g., for using
po-unlock-lock-po here, or the ppo<=po patch I originally proposed) is
at average in the ballpark of 500 lines of proof script. And as
evidenced by my discussion with Alan, these proofs only cover the "core
model".
So for this kind of thing, I think it's better to look at them to have
more confidence in the patch, and do the patch more based on which model
is more reasonable (as Alan enforces). Then consider the simplified
version as the more natural one, and not worry about future changes that
break the equivalence (that would usually indicate a problem with the
old model, rather than a problem with the patch).
For regressions, I would rather consider some desirable properties of
LKMM, like "DRF-SC" or "monotonicity of barriers" or "ppo <= po" and try
to prove those. This has the upside of not requiring to carry additional
models around, so much less than half the maintenance effort, and if the
property should be broken this usually would indicate a problem with the
patch. So I think the bang for the buck is much higher there.
Those are my current thoughts anyways : )
have fun, jonas
Powered by blists - more mailing lists