lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241018075347.2821102-5-ardb+git@google.com>
Date: Fri, 18 Oct 2024 09:53:48 +0200
From: Ard Biesheuvel <ardb+git@...gle.com>
To: linux-arm-kernel@...ts.infradead.org
Cc: linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org, 
	herbert@...dor.apana.org.au, will@...nel.org, catalin.marinas@....com, 
	Ard Biesheuvel <ardb@...nel.org>, Eric Biggers <ebiggers@...nel.org>, Kees Cook <kees@...nel.org>
Subject: [PATCH v4 0/3] arm64: Speed up CRC-32 using PMULL instructions

From: Ard Biesheuvel <ardb@...nel.org>

The CRC-32 code is library code, and is not part of the crypto
subsystem. This means that callers may not generally be aware of the
kind of implementation that backs it, and so we've refrained from using
FP/SIMD code in the past, as it disables preemption, and this may incur
scheduling latencies that the caller did not anticipate.

This was solved a while ago, and on arm64, kernel mode FP/SIMD no longer
disables preemption.

This means we can happily use PMULL instructions in the CRC-32 library
code, which permits an optimization to be implemented that results in a
speedup of 2 - 2.8x for inputs >1k in size (on Apple M2)

Patch #1 implements some prepwork to handle the scalar CRC-32
alternatives patching in C code.

Changes since v3:
- fix broken crc32be version
- add patch to tidy up existing code for reuse
- add 4-way code to existing .S file

Changes since v2:
- drop alternatives.h #include (#1)
- drop unneeded branch (#2)
- fix comment max -> min (#2)
- add Eric's Rb

Changes since v1:
- rename crc32-pmull.S to crc32-4way.S and avoid pmull in the function
  names to avoid confusion about the nature of the implementation;
- polish the asm a bit, and add some comments
- don't return via the scalar code if len dropped to 0 after calling the
  4-way code.

Cc: Eric Biggers <ebiggers@...nel.org>
Cc: Kees Cook <kees@...nel.org>

Ard Biesheuvel (3):
  arm64/lib: Handle CRC-32 alternative in C code
  arm64/crc32: Reorganize bit/byte ordering macros
  arm64/crc32: Implement 4-way interleave using PMULL

 arch/arm64/lib/Makefile     |   2 +-
 arch/arm64/lib/crc32-glue.c |  82 +++++
 arch/arm64/lib/crc32.S      | 344 ++++++++++++++++----
 3 files changed, 356 insertions(+), 72 deletions(-)
 create mode 100644 arch/arm64/lib/crc32-glue.c

-- 
2.47.0.rc1.288.g06298d1525-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ