lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 27 Jan 2023 14:18:41 +0100
From:   Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To:     paulmck@...nel.org, Alan Stern <stern@...land.harvard.edu>
Cc:     parri.andrea@...il.com, will@...nel.org, peterz@...radead.org,
        boqun.feng@...il.com, npiggin@...il.com, dhowells@...hat.com,
        j.alglave@....ac.uk, luc.maranget@...ia.fr, akiyks@...il.com,
        dlustig@...dia.com, joel@...lfernandes.org, urezki@...il.com,
        quic_neeraju@...cinc.com, frederic@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] tools/memory-model: Unify UNLOCK+LOCK pairings to
 po-unlock-lock-po



On 1/27/2023 12:21 AM, Paul E. McKenney wrote:
> On Thu, Jan 26, 2023 at 12:08:28PM -0800, Paul E. McKenney wrote:
>> On Thu, Jan 26, 2023 at 11:36:51AM -0500, Alan Stern wrote:
>>> On Thu, Jan 26, 2023 at 02:46:03PM +0100, Jonas Oberhauser wrote:
>>>> LKMM uses two relations for talking about UNLOCK+LOCK pairings:
>>>>
>>>> 	1) po-unlock-lock-po, which handles UNLOCK+LOCK pairings
>>>> 	   on the same CPU or immediate lock handovers on the same
>>>> 	   lock variable
>>>>
>>>> 	2) po;[UL];(co|po);[LKW];po, which handles UNLOCK+LOCK pairs
>>>> 	   literally as described in rcupdate.h#L1002, i.e., even
>>>> 	   after a sequence of handovers on the same lock variable.
>>>>
>>>> The latter relation is used only once, to provide the guarantee
>>>> defined in rcupdate.h#L1002 by smp_mb__after_unlock_lock(), which
>>>> makes any UNLOCK+LOCK pair followed by the fence behave like a full
>>>> barrier.
>>>>
>>>> This patch drops this use in favor of using po-unlock-lock-po
>>>> everywhere, which unifies the way the model talks about UNLOCK+LOCK
>>>> pairings.  At first glance this seems to weaken the guarantee given
>>>> by LKMM: When considering a long sequence of lock handovers
>>>> such as below, where P0 hands the lock to P1, which hands it to P2,
>>>> which finally executes such an after_unlock_lock fence, the mb
>>>> relation currently links any stores in the critical section of P0
>>>> to instructions P2 executes after its fence, but not so after the
>>>> patch.
>>>>
>>>> P0(int *x, int *y, spinlock_t *mylock)
>>>> {
>>>>          spin_lock(mylock);
>>>>          WRITE_ONCE(*x, 2);
>>>>          spin_unlock(mylock);
>>>>          WRITE_ONCE(*y, 1);
>>>> }
>>>>
>>>> P1(int *y, int *z, spinlock_t *mylock)
>>>> {
>>>>          int r0 = READ_ONCE(*y); // reads 1
>>>>          spin_lock(mylock);
>>>>          spin_unlock(mylock);
>>>>          WRITE_ONCE(*z,1);
>>>> }
>>>>
>>>> P2(int *z, int *d, spinlock_t *mylock)
>>>> {
>>>>          int r1 = READ_ONCE(*z); // reads 1
>>>>          spin_lock(mylock);
>>>>          spin_unlock(mylock);
>>>>          smp_mb__after_unlock_lock();
>>>>          WRITE_ONCE(*d,1);
>>>> }
>>>>
>>>> P3(int *x, int *d)
>>>> {
>>>>          WRITE_ONCE(*d,2);
>>>>          smp_mb();
>>>>          WRITE_ONCE(*x,1);
>>>> }
>>>>
>>>> exists (1:r0=1 /\ 2:r1=1 /\ x=2 /\ d=2)
>>>>
>>>> Nevertheless, the ordering guarantee given in rcupdate.h is actually
>>>> not weakened.  This is because the unlock operations along the
>>>> sequence of handovers are A-cumulative fences.  They ensure that any
>>>> stores that propagate to the CPU performing the first unlock
>>>> operation in the sequence must also propagate to every CPU that
>>>> performs a subsequent lock operation in the sequence.  Therefore any
>>>> such stores will also be ordered correctly by the fence even if only
>>>> the final handover is considered a full barrier.
>>>>
>>>> Indeed this patch does not affect the behaviors allowed by LKMM at
>>>> all.  The mb relation is used to define ordering through:
>>>> 1) mb/.../ppo/hb, where the ordering is subsumed by hb+ where the
>>>>     lock-release, rfe, and unlock-acquire orderings each provide hb
>>>> 2) mb/strong-fence/cumul-fence/prop, where the rfe and A-cumulative
>>>>     lock-release orderings simply add more fine-grained cumul-fence
>>>>     edges to substitute a single strong-fence edge provided by a long
>>>>     lock handover sequence
>>>> 3) mb/strong-fence/pb and various similar uses in the definition of
>>>>     data races, where as discussed above any long handover sequence
>>>>     can be turned into a sequence of cumul-fence edges that provide
>>>>     the same ordering.
>>>>
>>>> Signed-off-by: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
>>>> ---
>>> Reviewed-by: Alan Stern <stern@...land.harvard.edu>
>> A quick spot check showed no change in performance, so thank you both!
>>
>> Queued for review and further testing.
> And testing on https://github.com/paulmckrcu/litmus for litmus tests up
> to ten processes and allowing 10 minutes per litmus test got this:
>
> Exact output matches: 5208
> !!! Timed out: 38
> !!! Unknown primitive: 7
>
> This test compared output with and without your patch.
>
> For the tests with a Results clause, these failed:

Gave me a heart attack there for a second!

> Also, I am going to be pushing the scripts I use to mainline.  They might
> not be perfect, but they will be quite useful for this sort of change
> to the memory model.

I could also provide Coq proofs, although those are ignoring the 
srcu/data race parts at the moment.

Have fun, jonas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ