[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6629b8120807458ab76e1968056f5e10@AcuMS.aculab.com>
Date: Wed, 3 Apr 2024 08:12:09 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Eric Biggers' <ebiggers@...nel.org>, Ard Biesheuvel <ardb@...nel.org>
CC: "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Andy Lutomirski <luto@...nel.org>, "Chang S .
Bae" <chang.seok.bae@...el.com>
Subject: RE: [PATCH 0/6] Faster AES-XTS on modern x86_64 CPUs
From: Eric Biggers
> Sent: 26 March 2024 16:48
...
> Consider Intel Ice Lake for example, these are the AES-256-XTS encryption speeds
> on 4096-byte messages in MB/s I'm seeing:
>
> xts-aes-aesni 5136
> xts-aes-aesni-avx 5366
> xts-aes-vaes-avx2 9337
> xts-aes-vaes-avx10_256 9876
> xts-aes-vaes-avx10_512 10215
>
> So yes, on that CPU the biggest boost comes just from VAES, staying on AVX2.
> But taking advantage of AVX512 does help a bit more, first from the parts other
> than 512-bit registers, then a bit more from 512-bit registers.
How much does the kernel_fpu_begin() cost on real workloads?
(ie when the registers are live and it forces an extra save/restore)
I've not looked at the code but I often see what looks like
excessive inlining in crypto code.
This will speed up benchmarks but can have a negative effect
on real code both because of the time taken to load the
code and the effect of displacing other code.
It might be that this code is a simple loop....
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists