lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 9 May 2024 06:31:04 -0600
From: Neal Gompa <neal@...pa.dev>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Ard Biesheuvel <ardb@...nel.org>, Alex Bennée <alex.bennee@...aro.org>, 
	Will Deacon <will@...nel.org>, Hector Martin <marcan@...can.st>, Marc Zyngier <maz@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, Zayd Qumsieh <zayd_qumsieh@...le.com>, 
	Justin Lu <ih_justin@...le.com>, Ryan Houdek <Houdek.Ryan@...-emu.org>, 
	Mark Brown <broonie@...nel.org>, Mateusz Guzik <mjguzik@...il.com>, 
	Anshuman Khandual <anshuman.khandual@....com>, Oliver Upton <oliver.upton@...ux.dev>, 
	Miguel Luis <miguel.luis@...cle.com>, Joey Gouly <joey.gouly@....com>, 
	Christoph Paasch <cpaasch@...le.com>, Kees Cook <keescook@...omium.org>, 
	Sami Tolvanen <samitolvanen@...gle.com>, Baoquan He <bhe@...hat.com>, 
	Joel Granados <j.granados@...sung.com>, Dawei Li <dawei.li@...ngroup.cn>, 
	Andrew Morton <akpm@...ux-foundation.org>, Florent Revest <revest@...omium.org>, 
	David Hildenbrand <david@...hat.com>, Stefan Roesch <shr@...kernel.io>, Andy Chiu <andy.chiu@...ive.com>, 
	Josh Triplett <josh@...htriplett.org>, Oleg Nesterov <oleg@...hat.com>, Helge Deller <deller@....de>, 
	Zev Weiss <zev@...ilderbeest.net>, Ondrej Mosnacek <omosnace@...hat.com>, 
	Miguel Ojeda <ojeda@...nel.org>, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, Asahi Linux <asahi@...ts.linux.dev>
Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model

On Thu, May 9, 2024 at 5:13 AM Catalin Marinas <catalin.marinas@...com> wrote:
>
> On Tue, May 07, 2024 at 04:52:30PM +0200, Ard Biesheuvel wrote:
> > On Tue, 7 May 2024 at 12:24, Alex Bennée <alex.bennee@...aro.org> wrote:
> > > I think the main use case here is for emulation. When we run x86-on-arm
> > > in QEMU we do currently insert lots of extra barrier instructions on
> > > every load and store. If we can probe and set a TSO mode I can assure
> > > you we'll do the right thing ;-)
> >
> > Without a public specification of what TSO mode actually entails,
> > deciding which of those barriers can be dropped is not going to be as
> > straight-forward as you make it out to be.
> >
> > Apple's TSO mode is vertically integrated with Rosetta, which means
> > that TSO mode provides whatever Rosetta needs to run x86 code
> > correctly, and that it could mean different things on different
> > generations of the micro-architecture. And whether Apple's TSO is the
> > same as Fujitsu's is anyone's guess afaik.
>
> Indeed. Apart from using impdef registers, that's what I think is the
> second biggest problem with this feature (and the corresponding
> patches). We don't know the precise memory model, we can't tell whether
> this TSO bit is stored in the TLB. If it is, is it per ASID/VMID? The
> other problem Marc raised is what memory model is between two CPUs where
> only one has the TSO bit set? Does it only break the TSO model or is
> there a chance that it also breaks the default relaxed model? What other
> TSO flavours are out there, how do they compare with the Apple one?
>
> > Running a game and seeing it perform better is great, but it is not
> > the kind of rigor we usually attempt to apply when adding support for
> > architectural features. Hopefully, there will be some architectural
> > support for this in the future, but without any spec that defines the
> > memory model it implements, I am not convinced we should merge this.
>
> There is FEAT_LRCPC (available on Apple Silicon from M2 onwards). Rather
> than having a big knob to turn TSO on or off, this feature introduces
> instructions that permit a code generator to get the TSO semantics in a
> more efficient way (e.g. using LDAPR+STLR instead of the stricter
> LDAR+STLR; not sure how well these are implemented on the Apple
> Silicon). There are further improvements in FEAT_LRCPC{2,3} (with the
> latter adding support for SIMD but not available in hardware yet). So
> the direction from Arm is pretty clear, acknowledging that there is a
> need for such TSO emulation but not in the way of undocumented impdef
> registers. Whether more is needed here, I guess people working on
> emulators could reach out to Arm or CPU vendors with suggestions (the
> path to the architects is not straightforward, usually legal has a say,
> but it's doable, there are formal channels already).
>
> I see the impdef hardware TSO options as temporary until CPU
> implementations catch up to architected FEAT_LRCPC*. Given the problems
> already stated in this thread, I think such hacks should be carried
> downstream and (hopefully) will eventually vanish. Maybe those TSO knobs
> currently make an emulation faster than FEAT_LRCPC* but that's feedback
> to go to the microarchitects on the implementation (or architects on
> what other instructions should be covered).
>

They cannot ever "vanish" because we are supporting every Mx platform
back to the first one. The M1 series will never have FEAT_LRCPC.

I do not think it is unreasonable to support this method when we know
what the CPU platform is and FEAT_LRCPC does not exist.



--
真実はいつも一つ!/ Always, there's only one truth!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ