[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <218abf15-b4ed-4d13-9541-aab975bd3835@huaweicloud.com>
Date: Tue, 17 Jun 2025 21:00:30 +0200
From: Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>
To: Thomas Haas <t.haas@...bs.de>, Will Deacon <will@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Alan Stern <stern@...land.harvard.edu>, Andrea Parri
<parri.andrea@...il.com>, Boqun Feng <boqun.feng@...il.com>,
Nicholas Piggin <npiggin@...il.com>, David Howells <dhowells@...hat.com>,
Jade Alglave <j.alglave@....ac.uk>, Luc Maranget <luc.maranget@...ia.fr>,
"Paul E. McKenney" <paulmck@...nel.org>, Akira Yokosawa <akiyks@...il.com>,
Daniel Lustig <dlustig@...dia.com>, Joel Fernandes <joelagnelf@...dia.com>,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
lkmm@...ts.linux.dev, jonas.oberhauser@...weicloud.com,
"r.maseli@...bs.de" <r.maseli@...bs.de>
Subject: Re: [RFC] Potential problem in qspinlock due to mixed-size accesses
On 6/17/2025 4:23 PM, Thomas Haas wrote:
>
>
> On 17.06.25 16:17, Will Deacon wrote:
>> On Tue, Jun 17, 2025 at 10:42:04AM +0200, Hernan Ponce de Leon wrote:
>>> On 6/17/2025 8:19 AM, Thomas Haas wrote:
>>>> On 16.06.25 16:23, Will Deacon wrote:
>>>>> I'm half inclined to think that the Arm memory model should be
>>>>> tightened
>>>>> here; I can raise that with Arm and see what they say.
>>>>>
>>>>> Although the cited paper does give examples of store-forwarding from a
>>>>> narrow store to a wider load, the case in qspinlock is further
>>>>> constrained by having the store come from an atomic rmw and the load
>>>>> having acquire semantics. Setting aside the MSA part, that specific
>>>>> case
>>>>> _is_ ordered in the Arm memory model (and C++ release sequences
>>>>> rely on
>>>>> it iirc), so it's fair to say that Arm CPUs don't permit forwarding
>>>>> from
>>>>> an atomic rmw to an acquire load.
>>>>>
>>>>> Given that, I don't see how this is going to occur in practice.
>>>>
>>>> You are probably right. The ARM model's atomic-ordered-before relation
>>>>
>>>> let aob = rmw | [range(rmw)]; lrs; [A | Q]
>>>>
>>>> clearly orders the rmw-store with subsequent acquire loads (lrs =
>>>> local-
>>>> read-successor, A = acquire).
>>>> If we treat this relation (at least the second part) as a "global
>>>> ordering" and extend it by "si" (same-instruction), then the
>>>> problematic
>>>> reordering under MSA should be gone.
>>>> I quickly ran Dartagnan on the MSA litmus tests with this change to the
>>>> ARM model and all the tests still pass.
>>>
>>> Even with this change I still get violations (both safety and
>>> termination)
>>> for qspinlock with dartagnan.
>>
>> Please can you be more specific about the problems you see?
>
> I talked to Hernán personally and it turned out that he used the generic
> implementation of smp_cond_acquire (not sure if the name is correct)
> which uses a relaxed load followed by a barrier. In that case, replacing
> aob by aob;si does not change anything.
> Indeed, even in the reported problem we used the generic implementation
> (I was unaware of this), though it is easy to check that changing the
> relaxed load to acquire does not give sufficient orderings.
Yes, my bad. I was using the generic header rather than the aarch64
specific one and then the changes to the model were having not effect
(as they should).
Now I am using the aarch64 specific ones and I can confirm dartagnan
still reports the violations with the current model and making the
change proposed by Thomas (adding ;si just to the second part seems to
be enough) indeed removes all violations.
Hernan
>
>>
>>> I think the problem is actually with the Internal visibility axiom,
>>> because
>>> only making that one stronger seems to remove the violations.
>>
>> That sounds surprising to me, as there's nothing particularly weird about
>> Arm's coherence requirements when compared to other architectures, as far
>> as I'm aware.
>>
>
> I agree. The internal visibility axiom is not the issue I think.
>
Powered by blists - more mailing lists