[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251104182738.GA2419@sol>
Date: Tue, 4 Nov 2025 10:27:38 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: Harald Freudenberger <freude@...ux.ibm.com>
Cc: linux-crypto@...r.kernel.org, David Howells <dhowells@...hat.com>,
Ard Biesheuvel <ardb@...nel.org>,
"Jason A . Donenfeld" <Jason@...c4.com>,
Holger Dengler <dengler@...ux.ibm.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
linux-arm-kernel@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/15] SHA-3 library
On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
> > Thanks! Is this with the whole series applied? Those numbers are
> > pretty fast, so probably at least the Keccak acceleration part is
> > worthwhile. But just to reiterate what I asked for:
> >
> > Also, it would be helpful to provide the benchmark output from just
> > before "lib/crypto: s390/sha3: Add optimized Keccak function", just
> > after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
> > SHA-3 digest functions".
> >
> > So I'd like to see how much each change helped, which isn't clear if you
> > show only the result at the end.
> >
> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
> > doing the Keccak acceleration, then we should drop it for simplicity.
[...]
> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3
> digest functions:
>
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # module: sha3_kunit
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: 1..21
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 1 test_hash_test_vectors
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 3
> test_hash_incremental_updates
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 4
> test_hash_buffer_overruns
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 5 test_hash_overlaps
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 6
> test_hash_alignment_consistency
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 7
> test_hash_ctx_zeroization
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 8
> test_hash_interrupt_context_1
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 9
> test_hash_interrupt_context_2
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 10 test_sha3_224_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 11 test_sha3_256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 12 test_sha3_384_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 13 test_sha3_512_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 14 test_shake128_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 15 test_shake256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 16 test_shake128_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 17 test_shake256_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 19
> test_shake_multiple_squeezes
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=16: 80
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=64: 785
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=127:
> 812 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=128:
> 1619 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=200:
> 2319 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=256:
> 2176 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=511:
> 4881 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=512:
> 4968 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=1024:
> 7565 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=3173:
> 11909 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=4096:
> 10378 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=16384:
> 12273 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 21 benchmark_hash
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
>
> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak functions:
>
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: # module: sha3_kunit
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: 1..21
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 1 test_hash_test_vectors
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 3
> test_hash_incremental_updates
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 4
> test_hash_buffer_overruns
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 5 test_hash_overlaps
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 6
> test_hash_alignment_consistency
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 7
> test_hash_ctx_zeroization
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 8
> test_hash_interrupt_context_1
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 9
> test_hash_interrupt_context_2
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 10 test_sha3_224_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 11 test_sha3_256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 12 test_sha3_384_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 13 test_sha3_512_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 14 test_shake128_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 15 test_shake256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 16 test_shake128_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 17 test_shake256_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 19
> test_shake_multiple_squeezes
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=16: 211
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=64: 835
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=127:
> 1557 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=128:
> 1617 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=200:
> 1457 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=256:
> 1830 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=511:
> 3035 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=512:
> 3245 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=1024:
> 5319 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=3173:
> 9969 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=4096:
> 11123 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=16384:
> 12767 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 21 benchmark_hash
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
Thanks. So the results before and after "lib/crypto: s390/sha3: Add
optimized one-shot SHA-3 digest functions" are:
Length (bytes) Before After
============== ========== ==========
1 12 MB/s 12 MB/s
16 211 MB/s 80 MB/s
64 835 MB/s 785 MB/s
127 1557 MB/s 812 MB/s
128 1617 MB/s 1619 MB/s
200 1457 MB/s 2319 MB/s
256 1830 MB/s 2176 MB/s
511 3035 MB/s 4881 MB/s
512 3245 MB/s 4968 MB/s
1024 5319 MB/s 7565 MB/s
3173 9969 MB/s 11909 MB/s
4096 11123 MB/s 10378 MB/s
16384 12767 MB/s 12273 MB/s
Unfortunately that seems inconclusive. len=200, 256, 511, 512, 1024,
3173 improved. But len=16, 64, 127, 4096, 16384 regressed.
I expected the most improvement on short lengths. The fact that some of
the short lengths actually regressed is concerning.
It's also clear the the Keccak acceleration itself matters far more than
this additional one-shot optimization, as expected. The generic code
maxed out at only 259 MB/s for you.
I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" for now, to avoid the extra maintainence cost
and opportunity for bugs.
If you can provide more accurate numbers that show it's worthwhile, we
can reconsider. Maybe set the CPU to a fixed frequency, and run
sha3_kunit multiple times (triggered via KUnit's debugfs interface)?
- Eric
Powered by blists - more mailing lists