lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250228020855.GA5588@sol.localdomain>
Date: Thu, 27 Feb 2025 18:08:55 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: Bill Wendling <morbo@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
	"H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
	Nathan Chancellor <nathan@...nel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@...il.com>,
	Justin Stitt <justinstitt@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>, linux-crypto@...r.kernel.org,
	clang-built-linux <llvm@...ts.linux.dev>
Subject: Re: [PATCH] x86/crc32: use builtins to improve code generation

On Wed, Feb 26, 2025 at 11:08:22PM -0800, Bill Wendling wrote:
> > Doesn't this technically allow the compiler to insert CRC32 instructions
> > anywhere in arch/x86/ without the needed runtime CPU feature check?  Normally
> > when using intrinsics it's necessary to limit the scope of the feature
> > enablement to match the runtime CPU feature check that is done, e.g. by using
> > the target function attribute.
> >
> I'm not sure if CRC32 instructions will automatically be inserted when
> not explicitly called, especially since the other vector features are
> disabled. I wanted to limit enabling this flag for only crc32-glue.c,
> but my Makefile-fu failed me. The file appears to be compiled twice.
> But adding __attribute__((target("crc32"))) to the function would be
> much better.

Technically, limiting it to crc32-glue.c still isn't enough, as much of the code
in that file is executed before the crc32 instruction support is checked for.

I also noticed that -mcrc32 support wasn't added to clang until clang 14, by
https://github.com/llvm/llvm-project/commit/12fa608af44a80de8b655a8a984cd095908e7e80
But according to https://docs.kernel.org/process/changes.html the minimum clang
version to build Linux is 13.0.1.  So there's a missing check for support.

> > Do both gcc and clang consider these builtins to be a stable API, or do they
> > only guarantee the stability of _mm_crc32_*() from immintrin.h?  At least for
> > the rest of the SSE and AVX stuff, I thought that only the immintrin.h functions
> > are actually considered stable.
> >
> I don't know the answer for this. In general, once we (Clang) create a
> __builtin_* function it's not going away, because it will break anyone
> who uses them. (I assume the same is true for GCC.)

Here are examples of LLVM commits that removed x86 builtins:

* https://github.com/llvm/llvm-project/commit/09857a4bd166ca62a9610629731dfbf8f62cd955
* https://github.com/llvm/llvm-project/commit/9a14c369c422b244db78f1a9f947a891a75d912f
* https://github.com/llvm/llvm-project/commit/ec6024d0811b3116e0a29481b01179d5081a3b92
* https://github.com/llvm/llvm-project/commit/e4074432d5bf5c295f96eeed27c5b693f5b3bf16
* https://github.com/llvm/llvm-project/commit/9fddc3fd00b3ad5df5a3988e5cc4708254976173

So no, they do not appear to be considered stable.

(The equivalents in immintrin.h are stable, but good luck including immintrin.h
in the Linux kernel, since it depends on stdlib.h.)

Of course, if we really wanted this we could go with "it works in practice"
anyway.  But, given the small benefit of this patch vs. the potential risk I
don't think we should bother with it, unless it's acked by the gcc and clang
folks on the following points:

* The crc32 builtins are stable.

* gcc and clang will never generate crc32 instructions without explicitly using
  the builtins.  (BTW, keep in mind this ongoing work:
  https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=get&target=GCC+CRC+optimization.pdf)

Also note that crc32c_arch() already calls into the assembly code in
arch/x86/lib/crc32c-3way.S to handle lengths >= 512 bytes, and for handling the
tail data that assembly function already has a nice qword-at-a-time loop which
is exactly what we are trying to generate here.  A more promising approach might
be to reorganize things a bit so that we can reuse that assembly code.

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ