linux-kernel - Re: [RFC] Potential problem in qspinlock due to mixed-size accesses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aEwHufdehlQnBX7g@andrea>
Date: Fri, 13 Jun 2025 13:17:37 +0200
From: Andrea Parri <parri.andrea@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Haas <t.haas@...bs.de>, Alan Stern <stern@...land.harvard.edu>,
	Will Deacon <will@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
	Nicholas Piggin <npiggin@...il.com>,
	David Howells <dhowells@...hat.com>,
	Jade Alglave <j.alglave@....ac.uk>,
	Luc Maranget <luc.maranget@...ia.fr>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Akira Yokosawa <akiyks@...il.com>,
	Daniel Lustig <dlustig@...dia.com>,
	Joel Fernandes <joelagnelf@...dia.com>,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	lkmm@...ts.linux.dev, hernan.poncedeleon@...weicloud.com,
	jonas.oberhauser@...weicloud.com,
	"r.maseli@...bs.de" <r.maseli@...bs.de>
Subject: Re: [RFC] Potential problem in qspinlock due to mixed-size accesses

> (snip the excellent details)

Indeed, joining in praising this report -  Great work, Thomas!


> > ### Solutions
> > 
> > The problematic executions rely on the fact that T2 can move half of its
> > load operation (1) to before the xchg_tail (3).
> > Preventing this reordering solves all issues. Possible solutions are:
> >     - make the xchg_tail full-sized (i.e, also touch lock/pending bits).
> >       Note that if the kernel is configured with >= 16k cpus, then the tail
> > becomes larger than 16 bits and needs to be encoded in parts of the pending
> > byte as well.
> >       In this case, the kernel makes a full-sized (32-bit) access for the
> > xchg. So the above bugs are only present in the < 16k cpus setting.
> 
> Right, but that is the more expensive option for some.
> 
> >     - make the xchg_tail an acquire operation.
> >     - make the xchg_tail a release operation (this is an odd solution by
> > itself but works for aarch64 because it preserves REL->ACQ ordering). In
> > this case, maybe the preceding "smp_wmb()" can be removed.
> 
> I think I prefer this one, it move a barrier, not really adding
> additional overhead. Will?
> 
> >     - put some other read-read barrier between the xchg_tail and the load.
> > 
> > 
> > ### Implications for qspinlock executed on non-ARM architectures.
> > 
> > Unfortunately, there are no MSA extensions for other hardware memory models,
> > so we have to speculate based on whether the problematic reordering is
> > permitted if the problematic load was treated as two individual
> > instructions.
> > It seems Power and RISCV would have no problem reordering the instructions,
> > so qspinlock might also break on those architectures.
> 
> Power (and RiscV without ZABHA) 'emulate' the short XCHG using a full
> word LL/SC and should be good.
> 
> But yes, ZABHA might be equally broken.

RISC-V forbids store-forwarding from AMOs or SCs, certain (non-normative)
commentary in the spec clarifies that the same ordering rule applies when
the memory accesses in question only overlap partially.

I am not aware of any "RISC-V implementation" manifesting the load-load
re-ordering in question.  IAC, notice that making xchg_tail() a release
operation might not suffice to fix such an implementation given that the
arch has no plain load-acquire instruction yet and relies on the generic
(fence-based) code for atomic_cond_read_acquire().

  Andrea