[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <143273e9-1243-60bc-4fb0-eea6fb3de355@huaweicloud.com>
Date: Wed, 15 May 2024 08:44:30 +0200
From: Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>
To: paulmck@...nel.org, Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
kernel-team@...a.com, mingo@...nel.org, stern@...land.harvard.edu,
parri.andrea@...il.com, will@...nel.org, peterz@...radead.org,
boqun.feng@...il.com, npiggin@...il.com, dhowells@...hat.com,
j.alglave@....ac.uk, luc.maranget@...ia.fr, akiyks@...il.com,
Frederic Weisbecker <frederic@...nel.org>, Daniel Lustig
<dlustig@...dia.com>, Joel Fernandes <joel@...lfernandes.org>,
Mark Rutland <mark.rutland@....com>, Jonathan Corbet <corbet@....net>,
linux-doc@...r.kernel.org
Subject: Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate
unordered failing cmpxchg
On 5/6/2024 8:00 PM, Paul E. McKenney wrote:
> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>>>> This commit adds four litmus tests showing that a failing cmpxchg()
>>>> operation is unordered unless followed by an smp_mb__after_atomic()
>>>> operation.
>>>
>>> So far, my understanding was that all RMW operations without suffix
>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>>>
>>> I guess this shows again how important it is to model these full
>>> barriers explicitly inside the cat model, instead of relying on implicit
>>> conversions internal to herd.
>>>
>>> I'd like to propose a patch to this effect.
>>>
>>> What is the intended behavior of a failed cmpxchg()? Is it the same as a
>>> relaxed one?
>
> Yes, and unless I am too confused, LKMM currently does implement this.
> Please let me know if I am missing something.
>
>>> My suggestion would be in the direction of marking read and write events
>>> of these operations as Mb, and then defining
>>>
>>> (* full barrier events that appear in non-failing RMW *)
>>> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>>>
>>>
>>> let mb =
>>> [M] ; fencerel(Mb) ; [M]
>>> | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
>>> | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
>>> | ...
>>>
>>> The po \ rmw is because ordering is not provided internally of the rmw
>>
>> (removed the unnecessary si since LKMM is still non-mixed-accesses)
>
> Addition of mixed-access support would be quite welcome!
>
>> This could also be written with a single rule:
>>
>> | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
>>
>>> I suspect that after we added [rmw] sequences it could perhaps be
>>> simplified [...]
>>
>> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
>> act like strong fences when they appear in an rmw sequence.
>>
>> if (z==1) || x = 2; || xchg(&y,2) || if (y==2)
>> x = 1; || y =_rel 1; || || z=1;
>>
>>
>> right now, we allow x=2 overwriting x=1 (in case the last thread does not
>> propagate x=2 along with z=1) because on power, the xchg might be
>> implemented with a sync that doesn't get executed until the very end
>> of the program run.
>>
>>
>> Instead of its negative form (everything other than inside the rmw),
>> it could also be rewritten positively. Here's a somewhat short form:
>>
>> let mb =
>> [M] ; fencerel(Mb) ; [M]
>> (* everything across a full barrier RMW is ordered. This includes up to
>> one event inside the RMW. *)
>> | [M] ; po ; [RMW_MB] ; po ; [M]
>> (* full barrier RMW writes are ordered with everything behind the RMW *)
>> | [W & RMW_MB] ; po ; [M]
>> (* full barrier RMW reads are ordered with everything before the RMW *)
>> | [M] ; po ; [R & RMW_MB]
>> | ...
>
> Does this produce the results expected by the litmus tests in the Linux
> kernel source tree and also those at https://github.com/paulmckrcu/litmus?
>
> Thanx, Paul
I implemented in the dartagnan tool the changes proposed by Jonas (i.e.
changing the mb definition in the cat model and removing the fences that
were added programmatically).
I run this using the ~5K litmus test I have (it should include
everything from the source tree + the non-LISA ones from your repo). I
also checked with the version of qspinlock discussed in [1].
I do get the expected results.
Hernan
[1] https://lkml.org/lkml/2022/8/26/597
Powered by blists - more mailing lists