lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250225225932.GA2975818@google.com>
Date: Tue, 25 Feb 2025 22:59:32 +0000
From: Eric Biggers <ebiggers@...nel.org>
To: David Laight <david.laight.linux@...il.com>
Cc: Xiao Liang <shaw.leon@...il.com>, x86@...nel.org,
	linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org,
	Ard Biesheuvel <ardb@...nel.org>,
	Ben Greear <greearb@...delatech.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Andy Lutomirski <luto@...nel.org>
Subject: Re: [RFC PATCH 1/2] x86/fpu: make kernel-mode FPU reliably usable in
 softirqs

On Tue, Feb 25, 2025 at 10:21:33PM +0000, David Laight wrote:
> > As for supporting nested kernel-mode FPU if we wanted to go that way: yes, your
> > patch from last year
> > https://lore.kernel.org/lkml/20240403140138.393825-1-shaw.leon@gmail.com/
> > ostensibly did that.  However, I found some bugs in it; e.g., it didn't take
> > into account that struct fpu is variable-length.  So it didn't turn out as
> > simple as that patch made it seem.  Just extending fpregs_{lock,unlock}() to
> > kernel-mode FPU is a simpler solution with fewer edge cases, and it avoids
> > increasing the memory usage of the kernel.  So I thought I'd propose that first.
> 
> Since many kernel users don't want the traditional fpu, they just need to use
> an instruction that requires an AVX register or two, is it possible for code
> to specify a small save area for just two or four registers and then use just
> those registers? (so treating then all as caller-saved).
> I know that won't work with anything that affects the fpu status register,
> but if you want a single wide register for a PCIe read (to generate a big TLP)
> it is more than enough.
> 
> I'm sure there are horrid pitfalls, especially if IPI are still used to for
> deferred save of fpu state.

I'm afraid that's not an accurate summary of what uses the vector registers in
kernel mode.  The main use case is crypto, and most of the crypto code uses a
lot of vector registers.  Some of the older crypto code uses at most 8 vector
registers (xmm0-xmm7) for 32-bit compatibility, but newer code uses 16 or even
up to 32 YMM or ZMM registers.  The new AES-GCM code for example uses all 32
vector registers, and the new AES-XTS code uses 30.

In general, taking full advantage of the vector register set improves
performance, and the trend has very much been towards using more registers --
not fewer.  (And the registers have been getting larger too!)  AES by itself
tends to need about 8 registers to take advantage of the CPU's full AES
throughput, but there are other computations like GHASH or tweak computation
that need to be interleaved with AES, using more registers.  And various
constants and round keys can be cached in registers to improve performance.

If we had to save/restore a large number of vector registers in every crypto
function call (not amortized to one save/restore per return to userspace), that
would be a big performance problem.

Most of the crypto code certainly could be written to use fewer registers.  But
it would reduce performance, especially if we tried to squeeze it down to use a
really small number of registers like 2-4.  Plus any such efforts would
complicate efforts to port crypto code between the kernel and userspace, as
userspace does not have such constraints on the number of registers.

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ