lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251104182738.GA2419@sol>
Date: Tue, 4 Nov 2025 10:27:38 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: Harald Freudenberger <freude@...ux.ibm.com>
Cc: linux-crypto@...r.kernel.org, David Howells <dhowells@...hat.com>,
	Ard Biesheuvel <ardb@...nel.org>,
	"Jason A . Donenfeld" <Jason@...c4.com>,
	Holger Dengler <dengler@...ux.ibm.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	linux-arm-kernel@...ts.infradead.org, linux-s390@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/15] SHA-3 library

On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
> > Thanks!  Is this with the whole series applied?  Those numbers are
> > pretty fast, so probably at least the Keccak acceleration part is
> > worthwhile.  But just to reiterate what I asked for:
> > 
> >     Also, it would be helpful to provide the benchmark output from just
> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
> >     SHA-3 digest functions".
> > 
> > So I'd like to see how much each change helped, which isn't clear if you
> > show only the result at the end.
> > 
> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
> > doing the Keccak acceleration, then we should drop it for simplicity.
[...]
> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3
> digest functions:
> 
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 80
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 785
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 812 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1619 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 2319 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 2176 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 4881 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 4968 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 7565 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 11909 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 10378 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12273 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
> 
> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak functions:
> 
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 211
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 835
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 1557 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1617 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 1457 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 1830 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 3035 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 3245 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 5319 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 9969 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 11123 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12767 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21

Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
optimized one-shot SHA-3 digest functions" are:

    Length (bytes)      Before            After
    ==============    ==========        ==========
         1               12 MB/s           12 MB/s
        16              211 MB/s           80 MB/s
        64              835 MB/s          785 MB/s
       127             1557 MB/s          812 MB/s
       128             1617 MB/s         1619 MB/s
       200             1457 MB/s         2319 MB/s
       256             1830 MB/s         2176 MB/s
       511             3035 MB/s         4881 MB/s
       512             3245 MB/s         4968 MB/s
      1024             5319 MB/s         7565 MB/s
      3173             9969 MB/s        11909 MB/s
      4096            11123 MB/s        10378 MB/s
     16384            12767 MB/s        12273 MB/s

Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.

I expected the most improvement on short lengths.  The fact that some of
the short lengths actually regressed is concerning.

It's also clear the the Keccak acceleration itself matters far more than
this additional one-shot optimization, as expected.  The generic code
maxed out at only 259 MB/s for you.

I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" for now, to avoid the extra maintainence cost
and opportunity for bugs.

If you can provide more accurate numbers that show it's worthwhile, we
can reconsider.  Maybe set the CPU to a fixed frequency, and run
sha3_kunit multiple times (triggered via KUnit's debugfs interface)?

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ