[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250228212048.GA2812743@google.com>
Date: Fri, 28 Feb 2025 21:20:48 +0000
From: Eric Biggers <ebiggers@...nel.org>
To: Bill Wendling <morbo@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <nick.desaulniers+lkml@...il.com>,
Justin Stitt <justinstitt@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, linux-crypto@...r.kernel.org,
clang-built-linux <llvm@...ts.linux.dev>
Subject: Re: [PATCH v2] x86/crc32: use builtins to improve code generation
On Thu, Feb 27, 2025 at 03:47:03PM -0800, Bill Wendling wrote:
> For both gcc and clang, crc32 builtins generate better code than the
> inline asm. GCC improves, removing unneeded "mov" instructions. Clang
> does the same and unrolls the loops. GCC has no changes on i386, but
> Clang's code generation is vastly improved, due to Clang's "rm"
> constraint issue.
>
> The number of cycles improved by ~0.1% for GCC and ~1% for Clang, which
> is expected because of the "rm" issue. However, Clang's performance is
> better than GCC's by ~1.5%, most likely due to loop unrolling.
Also note that the patch
https://lore.kernel.org/r/20250210210741.471725-1-ebiggers@kernel.org/ (which is
already enqueued in the crc tree for 6.15) changes "rm" to "r" when the compiler
is clang, to improve clang's code generation. The numbers you quote are against
the original version, right?
- Eric
Powered by blists - more mailing lists