lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241016030349.GD1138@sol.localdomain>
Date: Tue, 15 Oct 2024 20:03:49 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Ard Biesheuvel <ardb+git@...gle.com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-crypto@...r.kernel.org, herbert@...dor.apana.org.au,
	will@...nel.org, catalin.marinas@....com,
	Ard Biesheuvel <ardb@...nel.org>, Kees Cook <kees@...nel.org>
Subject: Re: [PATCH 2/2] arm64/crc32: Implement 4-way interleave using PMULL

On Tue, Oct 15, 2024 at 12:41:40PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@...nel.org>
> 
> Now that kernel mode NEON no longer disables preemption, using FP/SIMD
> in library code which is not obviously part of the crypto subsystem is
> no longer problematic, as it will no longer incur unexpected latencies.
> 
> So accelerate the CRC-32 library code on arm64 to use a 4-way
> interleave, using PMULL instructions to implement the folding.
> 
> On Apple M2, this results in a speedup of 2 - 2.8x when using input
> sizes of 1k - 8k. For smaller sizes, the overhead of preserving and
> restoring the FP/SIMD register file may not be worth it, so 1k is used
> as a threshold for choosing this code path.
> 
> The coefficient tables were generated using code provided by Eric. [0]
> 
> [0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c
> 
> Cc: Eric Biggers <ebiggers@...nel.org>
> Signed-off-by: Ard Biesheuvel <ardb@...nel.org>
> ---
>  arch/arm64/lib/Makefile      |   2 +-
>  arch/arm64/lib/crc32-glue.c  |  36 +++
>  arch/arm64/lib/crc32-pmull.S | 240 ++++++++++++++++++++
>  3 files changed, 277 insertions(+), 1 deletion(-)

Thanks for doing this!  The new code looks good to me.  4-way does seem like the
right choice for arm64.

I'd recommend calling the file crc32-4way.S and the functions
crc32*_arm64_4way(), rather than crc32-pmull.S and crc32*_pmull().  This would
avoid confusion with a CRC implementation that is actually based entirely on
pmull (which is possible).  The proposed implementation uses the crc32
instructions to do most of the work and only uses pmull for combining the CRCs.
Yes, crc32c-pcl-intel-asm_64.S made this same mistake, but it is a mistake, IMO.

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ