[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251020175736.GC1644@sol>
Date: Mon, 20 Oct 2025 10:57:36 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Holger Dengler <dengler@...ux.ibm.com>
Cc: David Howells <dhowells@...hat.com>, Ard Biesheuvel <ardb@...nel.org>,
"Jason A . Donenfeld" <Jason@...c4.com>,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-s390@...r.kernel.org, linux-crypto@...r.kernel.org,
Harald Freudenberger <freude@...ux.ibm.com>
Subject: Re: [PATCH 15/17] lib/crypto: s390/sha3: Migrate optimized code into
library
On Mon, Oct 20, 2025 at 04:00:42PM +0200, Holger Dengler wrote:
> On 20/10/2025 02:50, Eric Biggers wrote:
> > Instead of exposing the s390-optimized SHA-3 code via s390-specific
> > crypto_shash algorithms, instead just implement the sha3_absorb_blocks()
> > and sha3_keccakf() library functions. This is much simpler, it makes
> > the SHA-3 library functions be s390-optimized, and it fixes the
> > longstanding issue where the s390-optimized SHA-3 code was disabled by
> > default. SHA-3 still remains available through crypto_shash, but
> > individual architectures no longer need to handle it.
> >
> > Note that the existing code used both CPACF_KIMD_SHA3_224 and
> > CPACF_KIMD_SHA3_256 after checking for just CPACF_KIMD_SHA3_256, and
> > similarly for 384 and 512. I've preserved that behavior.
> >
> > Signed-off-by: Eric Biggers <ebiggers@...nel.org>
> The current code also cover a performance feature, which allows (on
> supported hardware, e.g. z17) to skip the ICV initialization.
I'm not sure if by "ICV" you mean "Integrity Check Value" or "Initial
Chaining Value", but SHA-3 doesn't have either of those. It just starts
with a state of all zeroes. I assume that skipping the
zero-initialization of the state is what you're referring to?
> support has been introduced with 88c02b3f79a6 ("s390/sha3: Support
> sha3 performance enhancements"). Unfortunately, this patch removes
> this support. Was this intended?
For now, yes. I should have explained more in the patch, sorry.
As currently proposed, lib/crypto/sha3.c supports arch-specific
overrides of sha3_absorb_blocks() and sha3_keccakf(). Those cover the
Keccak-f permutation which is by far the most performance critical part.
This strategy is working well in the SHA-2, SHA-1, and MD5 libraries,
which support the same level of arch overrides.
We could update lib/crypto/sha3.c to allow architectures to override
more of the code. But we need to consider the tradeoffs:
- Risk of bugs. QEMU doesn't support the s390 SHA-3 instructions, so no
one except the s390 folks can test the code. I can try to write code
for you, but I can't test it. And the s390 SHA-3 code has had bugs;
see commits 992b7066800f, 68279380266a5, 73c2437109c3.
The first priority should be correctness.
- The proposed change to the init functions would cause the format of
'struct __sha3_ctx' to be architecture-dependent. While we can do
that if really needed, it's something that's best avoided for
simplicity. It opens up more opportunity for error.
- As I mentioned, Keccak-f is by far the most performance critical part
anyway. The initial state is just all zeroes, and initializing it is
very lightweight. Also consider that these contexts are often on the
stack, and people increasingly set the "init all stack variables to
zero" kernel hardening option anyway.
I'll also note that commit 88c02b3f79a6 has no performance data in it.
So it's not clear that it actually helped much.
- The library has an optimization to greatly reduce the size of the
context: instead of buffering data separately, it just XOR's data into
the state. So, if there's a sha3_*_init() followed by a sha3_update()
of less than 1 block, it will have to initialize the state anyway. We
can delay it until that point on s390. But again: complexity.
- These potential additional s390 optimizations would presumably help
the most on short messages. However, on short messages, merely
switching to the library often gives a large performance improvement
due to eliminating the very slow call to crypto_alloc_shash(). That's
actually a lot more important.
I would suggest that we drop the sha3_*_init() optimization from
consideration for now. Providing overrides for the one-shot functions
sha3_{224,256,384,512}() should be simpler as well as possibly a bit
more useful, and I would suggest exploring that.
I guess I can try to write the code for you again. But again, without
QEMU support I cannot test it. The first priority in cryptography code
is correctness, so that's not a great position to be in.
Note that for new optimized code I'm requiring QEMU support for the
instructions it uses. This one would only be allowed because code that
used these instructions already existed in arch/s390/crypto/.
> Please also add me and Harald Freudenberger to the cc: list for this patch.
Will do, thanks.
- Eric
Powered by blists - more mailing lists