[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEg-Je8Y8+n5N-vLh0icELgWm+L1eRchqxX28Hg_mi9Bam+xRA@mail.gmail.com>
Date: Wed, 10 Apr 2024 21:37:47 -0400
From: Neal Gompa <neal@...pa.dev>
To: Hector Martin <marcan@...can.st>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Marc Zyngier <maz@...nel.org>, Mark Rutland <mark.rutland@....com>,
Zayd Qumsieh <zayd_qumsieh@...le.com>, Justin Lu <ih_justin@...le.com>,
Ryan Houdek <Houdek.Ryan@...-emu.org>, Mark Brown <broonie@...nel.org>,
Ard Biesheuvel <ardb@...nel.org>, Mateusz Guzik <mjguzik@...il.com>,
Anshuman Khandual <anshuman.khandual@....com>, Oliver Upton <oliver.upton@...ux.dev>,
Miguel Luis <miguel.luis@...cle.com>, Joey Gouly <joey.gouly@....com>,
Christoph Paasch <cpaasch@...le.com>, Kees Cook <keescook@...omium.org>,
Sami Tolvanen <samitolvanen@...gle.com>, Baoquan He <bhe@...hat.com>,
Joel Granados <j.granados@...sung.com>, Dawei Li <dawei.li@...ngroup.cn>,
Andrew Morton <akpm@...ux-foundation.org>, Florent Revest <revest@...omium.org>,
David Hildenbrand <david@...hat.com>, Stefan Roesch <shr@...kernel.io>, Andy Chiu <andy.chiu@...ive.com>,
Josh Triplett <josh@...htriplett.org>, Oleg Nesterov <oleg@...hat.com>, Helge Deller <deller@....de>,
Zev Weiss <zev@...ilderbeest.net>, Ondrej Mosnacek <omosnace@...hat.com>,
Miguel Ojeda <ojeda@...nel.org>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, Asahi Linux <asahi@...ts.linux.dev>
Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model
On Wed, Apr 10, 2024 at 8:51 PM Hector Martin <marcan@...can.st> wrote:
>
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.
>
> Some ARM64 CPUs intrinsically implement the TSO memory model, while
> others expose is as an IMPDEF control. Apple Silicon SoCs are in the
> latter category. Using TSO for x86 emulation on chips that support it
> has been shown to provide a massive performance boost [1].
>
> Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
> is initially not implemented for any architectures.
>
> Patch 2 implements it for CPUs which are known, to the best of my
> knowledge, to always implement the TSO memory model unconditionally.
> This uses the cpufeature mechanism to only enable this if *all* cores in
> the system meet the requirements.
>
> Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
> register across context switches. This register contains IMPDEF flags
> related to CPU execution, and on Apple CPUs this is where the runtime
> TSO toggle bit is implemented. Other CPUs could conceivably benefit from
> this scaffolding if they also use ACTLR_EL1 for things that could
> ostensibly be runtime controlled and context-switched. For this to work,
> ACTLR_EL1 must have a uniform layout across all cores in the system.
>
> Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
> hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
> feature is detected (on all CPUs, which also implies the uniform
> ACTLR_EL1 layout).
>
> This series has been brewing in the downstream Asahi Linux tree for a
> while now, and ships to thousands of users. A subset have been using it
> with FEX-Emu, which already supports this feature. This rebase on
> v6.9-rc1 is only build-tested (all intermediate commits with and without
> the config enabled, on ARM64) but I'll update the downstream branch soon
> with this version and get it pushed out to users/testers.
>
> The Apple support works on bare metal and *should* work exactly the same
> way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
> though I haven't personally verified this. KVM support for this is left
> for a future patchset.
>
> (Apologies for the large Cc: list; I want to make sure nobody who got
> Cced on Zayd's alternate take is left out of this one.)
>
> [1] https://fex-emu.com/FEX-2306/
> [2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
> [3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/
>
> To: Catalin Marinas <catalin.marinas@....com>
> To: Will Deacon <will@...nel.org>
> To: Marc Zyngier <maz@...nel.org>
> To: Mark Rutland <mark.rutland@....com>
> Cc: Zayd Qumsieh <zayd_qumsieh@...le.com>
> Cc: Justin Lu <ih_justin@...le.com>
> Cc: Ryan Houdek <Houdek.Ryan@...-emu.org>
> Cc: Mark Brown <broonie@...nel.org>
> Cc: Ard Biesheuvel <ardb@...nel.org>
> Cc: Mateusz Guzik <mjguzik@...il.com>
> Cc: Anshuman Khandual <anshuman.khandual@....com>
> Cc: Oliver Upton <oliver.upton@...ux.dev>
> Cc: Miguel Luis <miguel.luis@...cle.com>
> Cc: Joey Gouly <joey.gouly@....com>
> Cc: Christoph Paasch <cpaasch@...le.com>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Sami Tolvanen <samitolvanen@...gle.com>
> Cc: Baoquan He <bhe@...hat.com>
> Cc: Joel Granados <j.granados@...sung.com>
> Cc: Dawei Li <dawei.li@...ngroup.cn>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Florent Revest <revest@...omium.org>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Stefan Roesch <shr@...kernel.io>
> Cc: Andy Chiu <andy.chiu@...ive.com>
> Cc: Josh Triplett <josh@...htriplett.org>
> Cc: Oleg Nesterov <oleg@...hat.com>
> Cc: Helge Deller <deller@....de>
> Cc: Zev Weiss <zev@...ilderbeest.net>
> Cc: Ondrej Mosnacek <omosnace@...hat.com>
> Cc: Miguel Ojeda <ojeda@...nel.org>
> Cc: linux-arm-kernel@...ts.infradead.org
> Cc: linux-kernel@...r.kernel.org
> Cc: Asahi Linux <asahi@...ts.linux.dev>
>
> Signed-off-by: Hector Martin <marcan@...can.st>
> ---
> Hector Martin (4):
> prctl: Introduce PR_{SET,GET}_MEM_MODEL
> arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
> arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
> arm64: Implement Apple IMPDEF TSO memory model control
>
> arch/arm64/Kconfig | 14 ++++++
> arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
> arch/arm64/include/asm/cpufeature.h | 10 +++++
> arch/arm64/include/asm/processor.h | 3 ++
> arch/arm64/kernel/Makefile | 3 +-
> arch/arm64/kernel/cpufeature.c | 11 ++---
> arch/arm64/kernel/cpufeature_impdef.c | 61 ++++++++++++++++++++++++++
> arch/arm64/kernel/process.c | 71 +++++++++++++++++++++++++++++++
> arch/arm64/kernel/setup.c | 8 ++++
> arch/arm64/tools/cpucaps | 2 +
> include/linux/memory_ordering_model.h | 11 +++++
> include/uapi/linux/prctl.h | 5 +++
> kernel/sys.c | 21 +++++++++
> 13 files changed, 229 insertions(+), 6 deletions(-)
> ---
> base-commit: 4cece764965020c22cff7665b18a012006359095
> change-id: 20240411-tso-e86fdceb94b8
>
The series looks good to me.
Reviewed-by: Neal Gompa <neal@...pa.dev>
--
真実はいつも一つ!/ Always, there's only one truth!
Powered by blists - more mailing lists