lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <70461134f12796b1166978c8628b5cf3@linux.ibm.com>
Date: Wed, 05 Nov 2025 09:16:56 +0100
From: Harald Freudenberger <freude@...ux.ibm.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: linux-crypto@...r.kernel.org, David Howells <dhowells@...hat.com>,
        Ard
 Biesheuvel <ardb@...nel.org>,
        "Jason A . Donenfeld" <Jason@...c4.com>,
        Holger Dengler <dengler@...ux.ibm.com>,
        Herbert Xu
 <herbert@...dor.apana.org.au>,
        linux-arm-kernel@...ts.infradead.org, linux-s390@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/15] SHA-3 library

On 2025-11-04 19:27, Eric Biggers wrote:
> On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
>> > Thanks!  Is this with the whole series applied?  Those numbers are
>> > pretty fast, so probably at least the Keccak acceleration part is
>> > worthwhile.  But just to reiterate what I asked for:
>> >
>> >     Also, it would be helpful to provide the benchmark output from just
>> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>> >     SHA-3 digest functions".
>> >
>> > So I'd like to see how much each change helped, which isn't clear if you
>> > show only the result at the end.
>> >
>> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
>> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
>> > doing the Keccak acceleration, then we should drop it for simplicity.
> [...]
>> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot 
>> SHA-3
>> digest functions:
>> 
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 80
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 785
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 812 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1619 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2319 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 2176 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 4881 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 4968 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 7565 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 11909 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 10378 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12273 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> 
>> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak 
>> functions:
>> 
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 211
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 835
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1557 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1617 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 1457 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 1830 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 3035 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 3245 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 5319 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 9969 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11123 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12767 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
> 
> Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
> optimized one-shot SHA-3 digest functions" are:
> 
>     Length (bytes)      Before            After
>     ==============    ==========        ==========
>          1               12 MB/s           12 MB/s
>         16              211 MB/s           80 MB/s
>         64              835 MB/s          785 MB/s
>        127             1557 MB/s          812 MB/s
>        128             1617 MB/s         1619 MB/s
>        200             1457 MB/s         2319 MB/s
>        256             1830 MB/s         2176 MB/s
>        511             3035 MB/s         4881 MB/s
>        512             3245 MB/s         4968 MB/s
>       1024             5319 MB/s         7565 MB/s
>       3173             9969 MB/s        11909 MB/s
>       4096            11123 MB/s        10378 MB/s
>      16384            12767 MB/s        12273 MB/s
> 
> Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
> 3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.
> 
> I expected the most improvement on short lengths.  The fact that some 
> of
> the short lengths actually regressed is concerning.
> 
> It's also clear the the Keccak acceleration itself matters far more 
> than
> this additional one-shot optimization, as expected.  The generic code
> maxed out at only 259 MB/s for you.
> 
> I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
> SHA-3 digest functions" for now, to avoid the extra maintainence cost
> and opportunity for bugs.
> 
> If you can provide more accurate numbers that show it's worthwhile, we
> can reconsider.  Maybe set the CPU to a fixed frequency, and run
> sha3_kunit multiple times (triggered via KUnit's debugfs interface)?
> 
> - Eric

The focus should be on the small data. Let me see what I can do ...
I used a zVM guest for this. Instead use an LPAR may be an option and
some CPU pinning. And do some more tests to be able to calculate a gauss
distribution. However, not within the next few days.
So I agree with you: let's hold back the one-shot optimization.

Harald Freudenberger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ