[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5e84e9fa-38a9-45fd-a67c-2db399860480@tu-bs.de>
Date: Thu, 19 Jun 2025 16:11:47 +0200
From: Thomas Haas <t.haas@...bs.de>
To: Will Deacon <will@...nel.org>, Hernan Ponce de Leon
<hernan.poncedeleon@...weicloud.com>
CC: Peter Zijlstra <peterz@...radead.org>, Alan Stern
<stern@...land.harvard.edu>, Andrea Parri <parri.andrea@...il.com>, Boqun
Feng <boqun.feng@...il.com>, Nicholas Piggin <npiggin@...il.com>, David
Howells <dhowells@...hat.com>, Jade Alglave <j.alglave@....ac.uk>, Luc
Maranget <luc.maranget@...ia.fr>, "Paul E. McKenney" <paulmck@...nel.org>,
Akira Yokosawa <akiyks@...il.com>, Daniel Lustig <dlustig@...dia.com>, Joel
Fernandes <joelagnelf@...dia.com>, <linux-kernel@...r.kernel.org>,
<linux-arch@...r.kernel.org>, <lkmm@...ts.linux.dev>,
<jonas.oberhauser@...weicloud.com>, "r.maseli@...bs.de" <r.maseli@...bs.de>
Subject: Re: [RFC] Potential problem in qspinlock due to mixed-size accesses
On 19.06.25 14:30, Will Deacon wrote:
> On Tue, Jun 17, 2025 at 09:00:30PM +0200, Hernan Ponce de Leon wrote:
>> On 6/17/2025 4:23 PM, Thomas Haas wrote:
>>>
>>>
>>> On 17.06.25 16:17, Will Deacon wrote:
>>>> On Tue, Jun 17, 2025 at 10:42:04AM +0200, Hernan Ponce de Leon wrote:
>>>>> On 6/17/2025 8:19 AM, Thomas Haas wrote:
>>>>>> On 16.06.25 16:23, Will Deacon wrote:
>>>>>>> I'm half inclined to think that the Arm memory model
>>>>>>> should be tightened
>>>>>>> here; I can raise that with Arm and see what they say.
>>>>>>>
>>>>>>> Although the cited paper does give examples of store-forwarding from a
>>>>>>> narrow store to a wider load, the case in qspinlock is further
>>>>>>> constrained by having the store come from an atomic rmw and the load
>>>>>>> having acquire semantics. Setting aside the MSA part,
>>>>>>> that specific case
>>>>>>> _is_ ordered in the Arm memory model (and C++ release
>>>>>>> sequences rely on
>>>>>>> it iirc), so it's fair to say that Arm CPUs don't permit
>>>>>>> forwarding from
>>>>>>> an atomic rmw to an acquire load.
>>>>>>>
>>>>>>> Given that, I don't see how this is going to occur in practice.
>>>>>>
>>>>>> You are probably right. The ARM model's atomic-ordered-before relation
>>>>>>
>>>>>> let aob = rmw | [range(rmw)]; lrs; [A | Q]
>>>>>>
>>>>>> clearly orders the rmw-store with subsequent acquire loads
>>>>>> (lrs = local-
>>>>>> read-successor, A = acquire).
>>>>>> If we treat this relation (at least the second part) as a "global
>>>>>> ordering" and extend it by "si" (same-instruction), then the
>>>>>> problematic
>>>>>> reordering under MSA should be gone.
>>>>>> I quickly ran Dartagnan on the MSA litmus tests with this change to the
>>>>>> ARM model and all the tests still pass.
>>>>>
>>>>> Even with this change I still get violations (both safety and
>>>>> termination)
>>>>> for qspinlock with dartagnan.
>>>>
>>>> Please can you be more specific about the problems you see?
>>>
>>> I talked to Hernán personally and it turned out that he used the generic
>>> implementation of smp_cond_acquire (not sure if the name is correct)
>>> which uses a relaxed load followed by a barrier. In that case, replacing
>>> aob by aob;si does not change anything.
>>> Indeed, even in the reported problem we used the generic implementation
>>> (I was unaware of this), though it is easy to check that changing the
>>> relaxed load to acquire does not give sufficient orderings.
>>
>> Yes, my bad. I was using the generic header rather than the aarch64 specific
>> one and then the changes to the model were having not effect (as they
>> should).
>>
>> Now I am using the aarch64 specific ones and I can confirm dartagnan still
>> reports the violations with the current model and making the change proposed
>> by Thomas (adding ;si just to the second part seems to be enough) indeed
>> removes all violations.
>
> That's great! Thanks for working together to get to the bottom of it. I
> was worried that this was the tip of the iceberg.
>
> I'll try to follow-up with Arm to see if that ';si' addition is an
> acceptable to atomic-ordered before. If not, we'll absorb the smp_wmb()
> into xchg_release() with a fat comment about RCsc and mixed-size
> accesses.
>
> Will
I have already contacted ARM, but I think it will take some time to get
an answer.
Interestingly, the definition of aob has changed in a newer version of
the ARM cat model:
let aob = [Exp & M]; rmw; [Exp & M]
| [Exp & M]; rmw; lrs; [A | Q]
| [Imp & TTD & R]; rmw; [HU]
Ignoring all the extra Exp/Imp/TTD stuff, you can see that the second
part of the definition only orders the load-part of the rmw with the
following acquire load. This suggests that a form of store forwarding
might indeed be expected, seeing that they deliberately removed the
rmw_store-acq_load ordering they had previously.
Nevertheless, for qspinlock to work we just need the rmw_load-acq_load
ordering that is still provided as long as we extend it with ";si".
--
=====================================
Thomas Haas
Technische Universität Braunschweig
Institut für Theoretische Informatik
Mühlenpfordtstr. 23, Raum IZ 343
38106 Braunschweig | Germany
t.haas@...braunschweig.de
https://www.tu-braunschweig.de/tcs/team/thomas-haas
Powered by blists - more mailing lists