lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 2 Apr 2024 17:44:04 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Ard Biesheuvel <ardb@...nel.org>
Cc: linux-crypto@...r.kernel.org, x86@...nel.org,
	linux-kernel@...r.kernel.org, Andy Lutomirski <luto@...nel.org>,
	"Chang S . Bae" <chang.seok.bae@...el.com>
Subject: Re: [PATCH v2 0/6] Faster AES-XTS on modern x86_64 CPUs

On Fri, Mar 29, 2024 at 02:31:30AM -0700, Eric Biggers wrote:
> > I wouldn't mind retiring the existing xts(aesni)
> > code entirely, and using the xts() wrapper around ecb-aes-aesni on
> > 32-bit and on non-AVX uarchs with AES-NI.
> 
> Yes, it will need to be benchmarked, but that probably makes sense.  If
> Wikipedia is to be trusted, on the Intel side only Westmere (from 2010) has
> AES-NI but not AVX, and on the AMD side all CPUs with AES-NI have AVX...

It looks like I missed some low-power CPUs.  Intel's Silvermont (2013), Goldmont
(2016), Goldmont Plus (2017), and Tremont (2020) support AES-NI but not AVX.
Their successor, Gracemont (2021), supports AVX.

I don't have any one of those immediately available to run a test on.  But just
doing a quick benchmark on Zen 1, xts-aes-aesni has 62% higher throughput than
xts(ecb-aes-aesni).  The significant difference seems expected, since there's a
lot of API overhead in the xts template, and it computes all the tweaks twice in
C code.

So I'm thinking we'll need to keep xts-aes-aesni around for now, alongside
xts-aes-aesni-avx.

(And with all the SIMD instructions taking a different number of arguments and
having different names for AVX vs non-AVX, I don't see a clean way to unify them
in assembly.  They could be unified if we used C intrinsics instead of assembly
and compiled a C function with and without the "avx" target.  However,
intrinsics bring their own issues and make it hard to control the generated
code.  I don't really want to rely on intrinsics for this code.)

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ