lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zd5B6huxqEcYIW6b@arm.com>
Date: Tue, 27 Feb 2024 20:11:22 +0000
From: Catalin Marinas <catalin.marinas@....com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>,
	kvmarm@...ts.cs.columbia.edu, kvm@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-doc@...r.kernel.org, maz@...nel.org, will@...nel.org,
	suzuki.poulose@....com, james.morse@....com, corbet@....net,
	boris.ostrovsky@...cle.com, darren@...amperecomputing.com,
	d.scott.phillips@...erecomputing.com
Subject: Re: [PATCH] arm64: errata: Minimize tlb flush due to vttbr writes on
 AmpereOne

(catching up on emails)

On Wed, Feb 07, 2024 at 09:45:59AM +0000, Oliver Upton wrote:
> On Wed, Feb 07, 2024 at 01:04:58AM -0800, Ganapatrao Kulkarni wrote:
> > AmpereOne implementation is doing tlb flush when ever there is
> > a write to vttbr_el2. As per KVM implementation, vttbr_el2 is updated
> > with VM's S2-MMU while return to VM. This is not necessary when there
> > is no VM context switch and a just return to same Guest.
> > 
> > Adding a check to avoid the vttbr_el2 write if the same value
> > already exist to prevent needless tlb flush.
> 
> Sorry, zero interest in taking what is really a uarch optimization.
> The errata framework exists to allow the kernel achieve *correctness*
> on a variety of hardware and is not a collection of party tricks for
> optimizing any given implementation.

Definitely, we should not abuse the errata framework for uarch
optimisations.

> Think of the precedent this would establish. What would stop
> implementers from, say, changing out our memcpy implementation into a
> a hundred different uarch-specific routines. That isn't maintainable,
> nor is it even testable as most folks don't have access to your
> hardware.

I agree. FTR, I'm fine with uarch optimisations if (a) they don't
run-time patch the kernel binary, (b) don't affect the existing hardware
and (c) show significant gains on the targeted uarch in some meaningful
benchmarks (definitely not microbenchmark hammering a certain kernel
path).

We did have uarch optimisations in the past that broke rule (a). We
tried to make them somewhat more justifiable by creating optimisation
classes (well, I think it was only ARM64_HAS_NO_HW_PREFETCH). But such
changes don't scale well for maintainers, so I'd rather not go back
there.

So, if one wants an optimisation, it better benefits the other
implementations or at least it doesn't make them worse. Now, we do have
hardware from mobiles to large enterprise systems, so at some point we
may have to make a call on different kernel behaviours, possibly even at
run-time. We already do this at build-time, e.g. CONFIG_NUMA where it
doesn't make much sense in a mobile (yet). But they should not be seen
as uarch specific tweaks, more like higher-level classes of
optimisations.

-- 
Catalin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ