[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251102234209.62133-7-ebiggers@kernel.org>
Date: Sun, 2 Nov 2025 15:42:09 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: linux-crypto@...r.kernel.org
Cc: linux-kernel@...r.kernel.org,
Ard Biesheuvel <ardb@...nel.org>,
"Jason A . Donenfeld" <Jason@...c4.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
x86@...nel.org,
Samuel Neves <sneves@....uc.pt>,
Eric Biggers <ebiggers@...nel.org>
Subject: [PATCH 6/6] lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs
AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq)
instruction with immediate 0x96. This approach, vs. the alternative of
two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS
code, since it reduces the instruction count and is faster on some CPUs.
Make blake2s_compress_avx512() take advantage of it too.
Signed-off-by: Eric Biggers <ebiggers@...nel.org>
---
lib/crypto/x86/blake2s-core.S | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/lib/crypto/x86/blake2s-core.S b/lib/crypto/x86/blake2s-core.S
index 869064f6ac16..7b1d98ca7482 100644
--- a/lib/crypto/x86/blake2s-core.S
+++ b/lib/crypto/x86/blake2s-core.S
@@ -276,14 +276,12 @@ SYM_FUNC_START(blake2s_compress_avx512)
vpshufd $0x93,%xmm2,%xmm2
decb %cl
jne .Lavx512_roundloop
// Compute the new h: h[0..7] ^= v[0..7] ^ v[8..15]
- vpxor %xmm10,%xmm0,%xmm0
- vpxor %xmm11,%xmm1,%xmm1
- vpxor %xmm2,%xmm0,%xmm0
- vpxor %xmm3,%xmm1,%xmm1
+ vpternlogd $0x96,%xmm10,%xmm2,%xmm0
+ vpternlogd $0x96,%xmm11,%xmm3,%xmm1
decq NBLOCKS
jne .Lavx512_mainloop
vmovdqu %xmm0,(CTX) // Store new h[0..3]
vmovdqu %xmm1,16(CTX) // Store new h[4..7]
--
2.51.2
Powered by blists - more mailing lists