[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXFROGbn_49njp_rivEidqfgnLymOCRnfSkV_dTX_hAz9w@mail.gmail.com>
Date: Thu, 17 Oct 2024 18:30:19 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Ard Biesheuvel <ardb+git@...gle.com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-crypto@...r.kernel.org, herbert@...dor.apana.org.au, will@...nel.org,
catalin.marinas@....com, Eric Biggers <ebiggers@...nel.org>, Kees Cook <kees@...nel.org>
Subject: Re: [PATCH v3 0/2] arm64: Speed up CRC-32 using PMULL instructions
On Thu, 17 Oct 2024 at 11:41, Ard Biesheuvel <ardb+git@...gle.com> wrote:
>
> From: Ard Biesheuvel <ardb@...nel.org>
>
> The CRC-32 code is library code, and is not part of the crypto
> subsystem. This means that callers may not generally be aware of the
> kind of implementation that backs it, and so we've refrained from using
> FP/SIMD code in the past, as it disables preemption, and this may incur
> scheduling latencies that the caller did not anticipate.
>
> This was solved a while ago, and on arm64, kernel mode FP/SIMD no longer
> disables preemption.
>
> This means we can happily use PMULL instructions in the CRC-32 library
> code, which permits an optimization to be implemented that results in a
> speedup of 2 - 2.8x for inputs >1k in size (on Apple M2)
>
> Patch #1 implements some prepwork to handle the scalar CRC-32
> alternatives patching in C code.
>
> Changes since v2:
> - drop alternatives.h #include (#1)
> - drop unneeded branch (#2)
> - fix comment max -> min (#2)
> - add Eric's Rb
>
> Changes since v1:
> - rename crc32-pmull.S to crc32-4way.S and avoid pmull in the function
> names to avoid confusion about the nature of the implementation;
> - polish the asm a bit, and add some comments
> - don't return via the scalar code if len dropped to 0 after calling the
> 4-way code.
>
> Cc: Eric Biggers <ebiggers@...nel.org>
> Cc: Kees Cook <kees@...nel.org>
>
> Ard Biesheuvel (2):
> arm64/lib: Handle CRC-32 alternative in C code
> arm64/crc32: Implement 4-way interleave using PMULL
>
I'll need to respin this - the crc32_be code doesn't actually work correctly.
Powered by blists - more mailing lists