lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGG=3QW4jo9RKtVxD2b3ZBjPhnAUYeUf_GPVh13e7gZkLFtuUQ@mail.gmail.com>
Date: Thu, 27 Feb 2025 12:56:05 -0800
From: Bill Wendling <morbo@...gle.com>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Eric Biggers <ebiggers@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, 
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>, Ard Biesheuvel <ardb@...nel.org>, Nathan Chancellor <nathan@...nel.org>, 
	Nick Desaulniers <nick.desaulniers+lkml@...il.com>, Justin Stitt <justinstitt@...gle.com>, 
	LKML <linux-kernel@...r.kernel.org>, linux-crypto@...r.kernel.org, 
	clang-built-linux <llvm@...ts.linux.dev>
Subject: Re: [PATCH] x86/crc32: use builtins to improve code generation

On Thu, Feb 27, 2025 at 4:17 AM Bill Wendling <morbo@...gle.com> wrote:
> On Thu, Feb 27, 2025 at 2:53 AM H. Peter Anvin <hpa@...or.com> wrote:
> > On February 26, 2025 10:28:59 PM PST, Eric Biggers <ebiggers@...nel.org> wrote:
> > >On Wed, Feb 26, 2025 at 10:12:47PM -0800, Bill Wendling wrote:
> > >> For both gcc and clang, crc32 builtins generate better code than the
> > >> inline asm. GCC improves, removing unneeded "mov" instructions. Clang
> > >> does the same and unrolls the loops. GCC has no changes on i386, but
> > >> Clang's code generation is vastly improved, due to Clang's "rm"
> > >> constraint issue.
> > >>
> > >> The number of cycles improved by ~0.1% for GCC and ~1% for Clang, which
> > >> is expected because of the "rm" issue. However, Clang's performance is
> > >> better than GCC's by ~1.5%, most likely due to loop unrolling.
> > >>
> > >> Link: https://github.com/llvm/llvm-project/issues/20571#issuecomment-2649330009
> > >> Cc: Thomas Gleixner <tglx@...utronix.de>
> > >> Cc: Ingo Molnar <mingo@...hat.com>
> > >> Cc: Borislav Petkov <bp@...en8.de>
> > >> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> > >> Cc: x86@...nel.org
> > >> Cc: "H. Peter Anvin" <hpa@...or.com>
> > >> Cc: Eric Biggers <ebiggers@...nel.org>
> > >> Cc: Ard Biesheuvel <ardb@...nel.org>
> > >> Cc: Nathan Chancellor <nathan@...nel.org>
> > >> Cc: Nick Desaulniers <nick.desaulniers+lkml@...il.com>
> > >> Cc: Justin Stitt <justinstitt@...gle.com>
> > >> Cc: linux-kernel@...r.kernel.org
> > >> Cc: linux-crypto@...r.kernel.org
> > >> Cc: llvm@...ts.linux.dev
> > >> Signed-off-by: Bill Wendling <morbo@...gle.com>
> > >> ---
> > >>  arch/x86/Makefile         | 3 +++
> > >>  arch/x86/lib/crc32-glue.c | 8 ++++----
> > >>  2 files changed, 7 insertions(+), 4 deletions(-)
> > >
> > >Thanks!  A couple concerns, though:
> > >
> > >> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > >> index 5b773b34768d..241436da1473 100644
> > >> --- a/arch/x86/Makefile
> > >> +++ b/arch/x86/Makefile
> > >> @@ -114,6 +114,9 @@ else
> > >>  KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
> > >>  endif
> > >>
> > >> +# Enables the use of CRC32 builtins.
> > >> +KBUILD_CFLAGS += -mcrc32
> > >
> > >Doesn't this technically allow the compiler to insert CRC32 instructions
> > >anywhere in arch/x86/ without the needed runtime CPU feature check?  Normally
> > >when using intrinsics it's necessary to limit the scope of the feature
> > >enablement to match the runtime CPU feature check that is done, e.g. by using
> > >the target function attribute.
> > >
> > >> diff --git a/arch/x86/lib/crc32-glue.c b/arch/x86/lib/crc32-glue.c
> > >> index 2dd18a886ded..fdb94bff25f4 100644
> > >> --- a/arch/x86/lib/crc32-glue.c
> > >> +++ b/arch/x86/lib/crc32-glue.c
> > >> @@ -48,9 +48,9 @@ u32 crc32_le_arch(u32 crc, const u8 *p, size_t len)
> > >>  EXPORT_SYMBOL(crc32_le_arch);
> > >>
> > >>  #ifdef CONFIG_X86_64
> > >> -#define CRC32_INST "crc32q %1, %q0"
> > >> +#define CRC32_INST __builtin_ia32_crc32di
> > >>  #else
> > >> -#define CRC32_INST "crc32l %1, %0"
> > >> +#define CRC32_INST __builtin_ia32_crc32si
> > >>  #endif
> > >
> > >Do both gcc and clang consider these builtins to be a stable API, or do they
> > >only guarantee the stability of _mm_crc32_*() from immintrin.h?  At least for
> > >the rest of the SSE and AVX stuff, I thought that only the immintrin.h functions
> > >are actually considered stable.
> > >
> > >- Eric
> >
> > There is that... also are there compiler versions that we support that do not have -mcrc32 support?
> >
> Checking GCC 5.1.0 and Clang 13.0.1, it seems that both support '-mcrc32'.
>
I just checked and GCC 5.1.0 doesn't appear to be able to compile the
kernel anymore, at least not with "defconfig". It doesn't have
retpoline support for one and then can't compile lib/zstd:

lib/zstd/decompress/zstd_decompress_block.c: In function
‘ZSTD_decompressSequences_default’:
lib/zstd/decompress/zstd_decompress_block.c:1539:1: error: inlining
failed in call to always_inline ‘ZSTD_decompressSequences_body’:
optimization level attribute mismatch
 ZSTD_decompressSequences_body(ZSTD_DCtx* dctx,
 ^
lib/zstd/decompress/zstd_decompress_block.c:1633:12: error: called from here
     return ZSTD_decompressSequences_body(dctx, dst, maxDstSize,
seqStart, seqSize, nbSeq, isLongOffset, frame);
            ^

GCC 6.1.0 gets further, but also doesn't have retpoline support. Maybe
the minimal version should be changed?

Anyway, GCC 5.1.0 doesn't support
__attribute__((__target__("crc32"))), so I'd have to use the flag. I
know I can conditionally add the flag with:

CFLAGS_crc32-glue.o := -mcrc32

But like I said, the file is compiled twice (why?), but only once with
the arch/x86/lib/Makefile. If anyone has any suggestions on how to
solve this, please let me know.

-bw

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ