linux-kernel - Re: [RFC] Potential problem in qspinlock due to mixed-size accesses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <218abf15-b4ed-4d13-9541-aab975bd3835@huaweicloud.com>
Date: Tue, 17 Jun 2025 21:00:30 +0200
From: Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>
To: Thomas Haas <t.haas@...bs.de>, Will Deacon <will@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
 Alan Stern <stern@...land.harvard.edu>, Andrea Parri
 <parri.andrea@...il.com>, Boqun Feng <boqun.feng@...il.com>,
 Nicholas Piggin <npiggin@...il.com>, David Howells <dhowells@...hat.com>,
 Jade Alglave <j.alglave@....ac.uk>, Luc Maranget <luc.maranget@...ia.fr>,
 "Paul E. McKenney" <paulmck@...nel.org>, Akira Yokosawa <akiyks@...il.com>,
 Daniel Lustig <dlustig@...dia.com>, Joel Fernandes <joelagnelf@...dia.com>,
 linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
 lkmm@...ts.linux.dev, jonas.oberhauser@...weicloud.com,
 "r.maseli@...bs.de" <r.maseli@...bs.de>
Subject: Re: [RFC] Potential problem in qspinlock due to mixed-size accesses

On 6/17/2025 4:23 PM, Thomas Haas wrote:
> 
> 
> On 17.06.25 16:17, Will Deacon wrote:
>> On Tue, Jun 17, 2025 at 10:42:04AM +0200, Hernan Ponce de Leon wrote:
>>> On 6/17/2025 8:19 AM, Thomas Haas wrote:
>>>> On 16.06.25 16:23, Will Deacon wrote:
>>>>> I'm half inclined to think that the Arm memory model should be 
>>>>> tightened
>>>>> here; I can raise that with Arm and see what they say.
>>>>>
>>>>> Although the cited paper does give examples of store-forwarding from a
>>>>> narrow store to a wider load, the case in qspinlock is further
>>>>> constrained by having the store come from an atomic rmw and the load
>>>>> having acquire semantics. Setting aside the MSA part, that specific 
>>>>> case
>>>>> _is_ ordered in the Arm memory model (and C++ release sequences 
>>>>> rely on
>>>>> it iirc), so it's fair to say that Arm CPUs don't permit forwarding 
>>>>> from
>>>>> an atomic rmw to an acquire load.
>>>>>
>>>>> Given that, I don't see how this is going to occur in practice.
>>>>
>>>> You are probably right. The ARM model's atomic-ordered-before relation
>>>>
>>>>        let aob = rmw | [range(rmw)]; lrs; [A | Q]
>>>>
>>>> clearly orders the rmw-store with subsequent acquire loads (lrs = 
>>>> local-
>>>> read-successor, A = acquire).
>>>> If we treat this relation (at least the second part) as a "global
>>>> ordering" and extend it by "si" (same-instruction), then the 
>>>> problematic
>>>> reordering under MSA should be gone.
>>>> I quickly ran Dartagnan on the MSA litmus tests with this change to the
>>>> ARM model and all the tests still pass.
>>>
>>> Even with this change I still get violations (both safety and 
>>> termination)
>>> for qspinlock with dartagnan.
>>
>> Please can you be more specific about the problems you see?
> 
> I talked to Hernán personally and it turned out that he used the generic 
> implementation of smp_cond_acquire (not sure if the name is correct) 
> which uses a relaxed load followed by a barrier. In that case, replacing 
> aob by aob;si does not change anything.
> Indeed, even in the reported problem we used the generic implementation 
> (I was unaware of this), though it is easy to check that changing the 
> relaxed load to acquire does not give sufficient orderings.

Yes, my bad. I was using the generic header rather than the aarch64 
specific one and then the changes to the model were having not effect 
(as they should).

Now I am using the aarch64 specific ones and I can confirm dartagnan 
still reports the violations with the current model and making the 
change proposed by Thomas (adding ;si just to the second part seems to 
be enough) indeed removes all violations.

Hernan

> 
>>
>>> I think the problem is actually with the Internal visibility axiom, 
>>> because
>>> only making that one stronger seems to remove the violations.
>>
>> That sounds surprising to me, as there's nothing particularly weird about
>> Arm's coherence requirements when compared to other architectures, as far
>> as I'm aware.
>>
> 
> I agree. The internal visibility axiom is not the issue I think.
>