[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <70461134f12796b1166978c8628b5cf3@linux.ibm.com>
Date: Wed, 05 Nov 2025 09:16:56 +0100
From: Harald Freudenberger <freude@...ux.ibm.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: linux-crypto@...r.kernel.org, David Howells <dhowells@...hat.com>,
Ard
Biesheuvel <ardb@...nel.org>,
"Jason A . Donenfeld" <Jason@...c4.com>,
Holger Dengler <dengler@...ux.ibm.com>,
Herbert Xu
<herbert@...dor.apana.org.au>,
linux-arm-kernel@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/15] SHA-3 library
On 2025-11-04 19:27, Eric Biggers wrote:
> On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
>> > Thanks! Is this with the whole series applied? Those numbers are
>> > pretty fast, so probably at least the Keccak acceleration part is
>> > worthwhile. But just to reiterate what I asked for:
>> >
>> > Also, it would be helpful to provide the benchmark output from just
>> > before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>> > after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>> > SHA-3 digest functions".
>> >
>> > So I'd like to see how much each change helped, which isn't clear if you
>> > show only the result at the end.
>> >
>> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
>> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
>> > doing the Keccak acceleration, then we should drop it for simplicity.
> [...]
>> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot
>> SHA-3
>> digest functions:
>>
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # module: sha3_kunit
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: 1..21
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 1
>> test_hash_test_vectors
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 3
>> test_hash_incremental_updates
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 5 test_hash_overlaps
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 10
>> test_sha3_224_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 11
>> test_sha3_256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 12
>> test_sha3_384_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 13
>> test_sha3_512_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 14
>> test_shake128_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 15
>> test_shake256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 16
>> test_shake128_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 17
>> test_shake256_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=1: 12
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=16: 80
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=64: 785
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=127:
>> 812 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=128:
>> 1619 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=200:
>> 2319 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=256:
>> 2176 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=511:
>> 4881 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=512:
>> 4968 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=1024:
>> 7565 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=3173:
>> 11909 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=4096:
>> 10378 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=16384:
>> 12273 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 21 benchmark_hash
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0
>> skip:0
>> total:21
>>
>> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak
>> functions:
>>
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: # module: sha3_kunit
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: 1..21
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 1
>> test_hash_test_vectors
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 3
>> test_hash_incremental_updates
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 5 test_hash_overlaps
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 10
>> test_sha3_224_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 11
>> test_sha3_256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 12
>> test_sha3_384_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 13
>> test_sha3_512_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 14
>> test_shake128_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 15
>> test_shake256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 16
>> test_shake128_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 17
>> test_shake256_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=1: 12
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=16: 211
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=64: 835
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=127:
>> 1557 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=128:
>> 1617 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=200:
>> 1457 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=256:
>> 1830 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=511:
>> 3035 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=512:
>> 3245 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=1024:
>> 5319 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=3173:
>> 9969 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=4096:
>> 11123 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash:
>> len=16384:
>> 12767 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 21 benchmark_hash
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0
>> skip:0
>> total:21
>
> Thanks. So the results before and after "lib/crypto: s390/sha3: Add
> optimized one-shot SHA-3 digest functions" are:
>
> Length (bytes) Before After
> ============== ========== ==========
> 1 12 MB/s 12 MB/s
> 16 211 MB/s 80 MB/s
> 64 835 MB/s 785 MB/s
> 127 1557 MB/s 812 MB/s
> 128 1617 MB/s 1619 MB/s
> 200 1457 MB/s 2319 MB/s
> 256 1830 MB/s 2176 MB/s
> 511 3035 MB/s 4881 MB/s
> 512 3245 MB/s 4968 MB/s
> 1024 5319 MB/s 7565 MB/s
> 3173 9969 MB/s 11909 MB/s
> 4096 11123 MB/s 10378 MB/s
> 16384 12767 MB/s 12273 MB/s
>
> Unfortunately that seems inconclusive. len=200, 256, 511, 512, 1024,
> 3173 improved. But len=16, 64, 127, 4096, 16384 regressed.
>
> I expected the most improvement on short lengths. The fact that some
> of
> the short lengths actually regressed is concerning.
>
> It's also clear the the Keccak acceleration itself matters far more
> than
> this additional one-shot optimization, as expected. The generic code
> maxed out at only 259 MB/s for you.
>
> I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
> SHA-3 digest functions" for now, to avoid the extra maintainence cost
> and opportunity for bugs.
>
> If you can provide more accurate numbers that show it's worthwhile, we
> can reconsider. Maybe set the CPU to a fixed frequency, and run
> sha3_kunit multiple times (triggered via KUnit's debugfs interface)?
>
> - Eric
The focus should be on the small data. Let me see what I can do ...
I used a zVM guest for this. Instead use an LPAR may be an option and
some CPU pinning. And do some more tests to be able to calculate a gauss
distribution. However, not within the next few days.
So I agree with you: let's hold back the one-shot optimization.
Harald Freudenberger
Powered by blists - more mailing lists