linux-kernel - Re: [PATCH v2] x86/crc32: use builtins to improve code generation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250228212048.GA2812743@google.com>
Date: Fri, 28 Feb 2025 21:20:48 +0000
From: Eric Biggers <ebiggers@...nel.org>
To: Bill Wendling <morbo@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
	"H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
	Nathan Chancellor <nathan@...nel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@...il.com>,
	Justin Stitt <justinstitt@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>, linux-crypto@...r.kernel.org,
	clang-built-linux <llvm@...ts.linux.dev>
Subject: Re: [PATCH v2] x86/crc32: use builtins to improve code generation

On Thu, Feb 27, 2025 at 03:47:03PM -0800, Bill Wendling wrote:
> For both gcc and clang, crc32 builtins generate better code than the
> inline asm. GCC improves, removing unneeded "mov" instructions. Clang
> does the same and unrolls the loops. GCC has no changes on i386, but
> Clang's code generation is vastly improved, due to Clang's "rm"
> constraint issue.
> 
> The number of cycles improved by ~0.1% for GCC and ~1% for Clang, which
> is expected because of the "rm" issue. However, Clang's performance is
> better than GCC's by ~1.5%, most likely due to loop unrolling.

Also note that the patch
https://lore.kernel.org/r/20250210210741.471725-1-ebiggers@kernel.org/ (which is
already enqueued in the crc tree for 6.15) changes "rm" to "r" when the compiler
is clang, to improve clang's code generation.  The numbers you quote are against
the original version, right?

- Eric