lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251102234209.62133-7-ebiggers@kernel.org>
Date: Sun,  2 Nov 2025 15:42:09 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: linux-crypto@...r.kernel.org
Cc: linux-kernel@...r.kernel.org,
	Ard Biesheuvel <ardb@...nel.org>,
	"Jason A . Donenfeld" <Jason@...c4.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	x86@...nel.org,
	Samuel Neves <sneves@....uc.pt>,
	Eric Biggers <ebiggers@...nel.org>
Subject: [PATCH 6/6] lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs

AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq)
instruction with immediate 0x96.  This approach, vs. the alternative of
two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS
code, since it reduces the instruction count and is faster on some CPUs.
Make blake2s_compress_avx512() take advantage of it too.

Signed-off-by: Eric Biggers <ebiggers@...nel.org>
---
 lib/crypto/x86/blake2s-core.S | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S
index 869064f6ac16..7b1d98ca7482 100644
--- a/lib/crypto/x86/blake2s-core.S
+++ b/lib/crypto/x86/blake2s-core.S
@@ -276,14 +276,12 @@ SYM_FUNC_START(blake2s_compress_avx512)
 	vpshufd		$0x93,%xmm2,%xmm2
 	decb		%cl
 	jne		.Lavx512_roundloop
 
 	// Compute the new h: h[0..7] ^= v[0..7] ^ v[8..15]
-	vpxor		%xmm10,%xmm0,%xmm0
-	vpxor		%xmm11,%xmm1,%xmm1
-	vpxor		%xmm2,%xmm0,%xmm0
-	vpxor		%xmm3,%xmm1,%xmm1
+	vpternlogd	$0x96,%xmm10,%xmm2,%xmm0
+	vpternlogd	$0x96,%xmm11,%xmm3,%xmm1
 	decq		NBLOCKS
 	jne		.Lavx512_mainloop
 
 	vmovdqu		%xmm0,(CTX)		// Store new h[0..3]
 	vmovdqu		%xmm1,16(CTX)		// Store new h[4..7]
-- 
2.51.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ