lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241125041129.192999-1-ebiggers@kernel.org>
Date: Sun, 24 Nov 2024 20:11:23 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: linux-crypto@...r.kernel.org,
	x86@...nel.org,
	Ard Biesheuvel <ardb@...nel.org>
Subject: [PATCH 0/6] x86: new optimized CRC functions, with VPCLMULQDQ support

This patchset is also available in git via:

    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git crc-x86-v1

This patchset applies on top of my other recent CRC patchsets
https://lore.kernel.org/r/20241103223154.136127-1-ebiggers@kernel.org/ and
https://lore.kernel.org/r/20241117002244.105200-1-ebiggers@kernel.org/ .
Consider it a preview for what may be coming next, as my priority is
getting those two other patchsets merged first.

This patchset adds a new assembly macro that expands into the body of a
CRC function for x86 for the specified number of bits, bit order, vector
length, and AVX level.  There's also a new script that generates the
constants needed by this function, given a CRC generator polynomial.

This approach allows easily wiring up an x86-optimized implementation of
any variant of CRC-8, CRC-16, CRC-32, or CRC-64, including full support
for VPCLMULQDQ.  On long messages the resulting functions are up to 4x
faster than the existing PCLMULQDQ optimized functions when they exist,
or up to 29x faster than the existing table-based functions.

This patchset starts by wiring up the new macro for crc32_le,
crc_t10dif, and crc32_be.  Later I'd also like to wire up crc64_be and
crc64_rocksoft, once the design of the library functions for those has
been fixed to be like what I'm doing for crc32* and crc_t10dif.

A similar approach of sharing code between CRC variants, and vector
lengths when applicable, should work for other architectures.  The CRC
constant generation script should be mostly reusable.

Eric Biggers (6):
  x86: move zmm exclusion list into CPU feature flag
  scripts/crc: add gen-crc-consts.py
  x86/crc: add "template" for [V]PCLMULQDQ based CRC functions
  x86/crc32: implement crc32_le using new template
  x86/crc-t10dif: implement crc_t10dif using new template
  x86/crc32: implement crc32_be using new template

 arch/x86/Kconfig                        |   2 +-
 arch/x86/crypto/aesni-intel_glue.c      |  22 +-
 arch/x86/include/asm/cpufeatures.h      |   1 +
 arch/x86/kernel/cpu/intel.c             |  22 +
 arch/x86/lib/Makefile                   |   2 +-
 arch/x86/lib/crc-pclmul-consts.h        | 148 ++++++
 arch/x86/lib/crc-pclmul-template-glue.h |  84 ++++
 arch/x86/lib/crc-pclmul-template.S      | 588 ++++++++++++++++++++++++
 arch/x86/lib/crc-t10dif-glue.c          |  22 +-
 arch/x86/lib/crc16-msb-pclmul.S         |   6 +
 arch/x86/lib/crc32-glue.c               |  38 +-
 arch/x86/lib/crc32-pclmul.S             | 220 +--------
 arch/x86/lib/crct10dif-pcl-asm_64.S     | 332 -------------
 scripts/crc/gen-crc-consts.py           | 207 +++++++++
 14 files changed, 1087 insertions(+), 607 deletions(-)
 create mode 100644 arch/x86/lib/crc-pclmul-consts.h
 create mode 100644 arch/x86/lib/crc-pclmul-template-glue.h
 create mode 100644 arch/x86/lib/crc-pclmul-template.S
 create mode 100644 arch/x86/lib/crc16-msb-pclmul.S
 delete mode 100644 arch/x86/lib/crct10dif-pcl-asm_64.S
 create mode 100755 scripts/crc/gen-crc-consts.py

-- 
2.47.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ