[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250619123002.GC21372@willie-the-truck>
Date: Thu, 19 Jun 2025 13:30:02 +0100
From: Will Deacon <will@...nel.org>
To: Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>
Cc: Thomas Haas <t.haas@...bs.de>, Peter Zijlstra <peterz@...radead.org>,
Alan Stern <stern@...land.harvard.edu>,
Andrea Parri <parri.andrea@...il.com>,
Boqun Feng <boqun.feng@...il.com>,
Nicholas Piggin <npiggin@...il.com>,
David Howells <dhowells@...hat.com>,
Jade Alglave <j.alglave@....ac.uk>,
Luc Maranget <luc.maranget@...ia.fr>,
"Paul E. McKenney" <paulmck@...nel.org>,
Akira Yokosawa <akiyks@...il.com>,
Daniel Lustig <dlustig@...dia.com>,
Joel Fernandes <joelagnelf@...dia.com>,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
lkmm@...ts.linux.dev, jonas.oberhauser@...weicloud.com,
"r.maseli@...bs.de" <r.maseli@...bs.de>
Subject: Re: [RFC] Potential problem in qspinlock due to mixed-size accesses
On Tue, Jun 17, 2025 at 09:00:30PM +0200, Hernan Ponce de Leon wrote:
> On 6/17/2025 4:23 PM, Thomas Haas wrote:
> >
> >
> > On 17.06.25 16:17, Will Deacon wrote:
> > > On Tue, Jun 17, 2025 at 10:42:04AM +0200, Hernan Ponce de Leon wrote:
> > > > On 6/17/2025 8:19 AM, Thomas Haas wrote:
> > > > > On 16.06.25 16:23, Will Deacon wrote:
> > > > > > I'm half inclined to think that the Arm memory model
> > > > > > should be tightened
> > > > > > here; I can raise that with Arm and see what they say.
> > > > > >
> > > > > > Although the cited paper does give examples of store-forwarding from a
> > > > > > narrow store to a wider load, the case in qspinlock is further
> > > > > > constrained by having the store come from an atomic rmw and the load
> > > > > > having acquire semantics. Setting aside the MSA part,
> > > > > > that specific case
> > > > > > _is_ ordered in the Arm memory model (and C++ release
> > > > > > sequences rely on
> > > > > > it iirc), so it's fair to say that Arm CPUs don't permit
> > > > > > forwarding from
> > > > > > an atomic rmw to an acquire load.
> > > > > >
> > > > > > Given that, I don't see how this is going to occur in practice.
> > > > >
> > > > > You are probably right. The ARM model's atomic-ordered-before relation
> > > > >
> > > > > let aob = rmw | [range(rmw)]; lrs; [A | Q]
> > > > >
> > > > > clearly orders the rmw-store with subsequent acquire loads
> > > > > (lrs = local-
> > > > > read-successor, A = acquire).
> > > > > If we treat this relation (at least the second part) as a "global
> > > > > ordering" and extend it by "si" (same-instruction), then the
> > > > > problematic
> > > > > reordering under MSA should be gone.
> > > > > I quickly ran Dartagnan on the MSA litmus tests with this change to the
> > > > > ARM model and all the tests still pass.
> > > >
> > > > Even with this change I still get violations (both safety and
> > > > termination)
> > > > for qspinlock with dartagnan.
> > >
> > > Please can you be more specific about the problems you see?
> >
> > I talked to Hernán personally and it turned out that he used the generic
> > implementation of smp_cond_acquire (not sure if the name is correct)
> > which uses a relaxed load followed by a barrier. In that case, replacing
> > aob by aob;si does not change anything.
> > Indeed, even in the reported problem we used the generic implementation
> > (I was unaware of this), though it is easy to check that changing the
> > relaxed load to acquire does not give sufficient orderings.
>
> Yes, my bad. I was using the generic header rather than the aarch64 specific
> one and then the changes to the model were having not effect (as they
> should).
>
> Now I am using the aarch64 specific ones and I can confirm dartagnan still
> reports the violations with the current model and making the change proposed
> by Thomas (adding ;si just to the second part seems to be enough) indeed
> removes all violations.
That's great! Thanks for working together to get to the bottom of it. I
was worried that this was the tip of the iceberg.
I'll try to follow-up with Arm to see if that ';si' addition is an
acceptable to atomic-ordered before. If not, we'll absorb the smp_wmb()
into xchg_release() with a fat comment about RCsc and mixed-size
accesses.
Will
Powered by blists - more mailing lists