[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a20b873-3c26-4a52-b118-c816ede7298d@tu-bs.de>
Date: Tue, 17 Jun 2025 16:23:11 +0200
From: Thomas Haas <t.haas@...bs.de>
To: Will Deacon <will@...nel.org>, Hernan Ponce de Leon
<hernan.poncedeleon@...weicloud.com>
CC: Peter Zijlstra <peterz@...radead.org>, Alan Stern
<stern@...land.harvard.edu>, Andrea Parri <parri.andrea@...il.com>, Boqun
Feng <boqun.feng@...il.com>, Nicholas Piggin <npiggin@...il.com>, David
Howells <dhowells@...hat.com>, Jade Alglave <j.alglave@....ac.uk>, Luc
Maranget <luc.maranget@...ia.fr>, "Paul E. McKenney" <paulmck@...nel.org>,
Akira Yokosawa <akiyks@...il.com>, Daniel Lustig <dlustig@...dia.com>, Joel
Fernandes <joelagnelf@...dia.com>, <linux-kernel@...r.kernel.org>,
<linux-arch@...r.kernel.org>, <lkmm@...ts.linux.dev>,
<jonas.oberhauser@...weicloud.com>, "r.maseli@...bs.de" <r.maseli@...bs.de>
Subject: Re: [RFC] Potential problem in qspinlock due to mixed-size accesses
On 17.06.25 16:17, Will Deacon wrote:
> On Tue, Jun 17, 2025 at 10:42:04AM +0200, Hernan Ponce de Leon wrote:
>> On 6/17/2025 8:19 AM, Thomas Haas wrote:
>>> On 16.06.25 16:23, Will Deacon wrote:
>>>> I'm half inclined to think that the Arm memory model should be tightened
>>>> here; I can raise that with Arm and see what they say.
>>>>
>>>> Although the cited paper does give examples of store-forwarding from a
>>>> narrow store to a wider load, the case in qspinlock is further
>>>> constrained by having the store come from an atomic rmw and the load
>>>> having acquire semantics. Setting aside the MSA part, that specific case
>>>> _is_ ordered in the Arm memory model (and C++ release sequences rely on
>>>> it iirc), so it's fair to say that Arm CPUs don't permit forwarding from
>>>> an atomic rmw to an acquire load.
>>>>
>>>> Given that, I don't see how this is going to occur in practice.
>>>
>>> You are probably right. The ARM model's atomic-ordered-before relation
>>>
>>> let aob = rmw | [range(rmw)]; lrs; [A | Q]
>>>
>>> clearly orders the rmw-store with subsequent acquire loads (lrs = local-
>>> read-successor, A = acquire).
>>> If we treat this relation (at least the second part) as a "global
>>> ordering" and extend it by "si" (same-instruction), then the problematic
>>> reordering under MSA should be gone.
>>> I quickly ran Dartagnan on the MSA litmus tests with this change to the
>>> ARM model and all the tests still pass.
>>
>> Even with this change I still get violations (both safety and termination)
>> for qspinlock with dartagnan.
>
> Please can you be more specific about the problems you see?
I talked to Hernán personally and it turned out that he used the generic
implementation of smp_cond_acquire (not sure if the name is correct)
which uses a relaxed load followed by a barrier. In that case, replacing
aob by aob;si does not change anything.
Indeed, even in the reported problem we used the generic implementation
(I was unaware of this), though it is easy to check that changing the
relaxed load to acquire does not give sufficient orderings.
>
>> I think the problem is actually with the Internal visibility axiom, because
>> only making that one stronger seems to remove the violations.
>
> That sounds surprising to me, as there's nothing particularly weird about
> Arm's coherence requirements when compared to other architectures, as far
> as I'm aware.
>
I agree. The internal visibility axiom is not the issue I think.
--
=====================================
Thomas Haas
Technische Universität Braunschweig
Institut für Theoretische Informatik
Mühlenpfordtstr. 23, Raum IZ 343
38106 Braunschweig | Germany
t.haas@...braunschweig.de
https://www.tu-braunschweig.de/tcs/team/thomas-haas
Powered by blists - more mailing lists