[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260116071513.12134-1-AlanSong-oc@zhaoxin.com>
Date: Fri, 16 Jan 2026 15:15:10 +0800
From: AlanSong-oc <AlanSong-oc@...oxin.com>
To: <herbert@...dor.apana.org.au>, <davem@...emloft.net>,
<ebiggers@...nel.org>, <Jason@...c4.com>, <ardb@...nel.org>,
<linux-crypto@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<x86@...nel.org>
CC: <CobeChen@...oxin.com>, <TonyWWang-oc@...oxin.com>, <YunShen@...oxin.com>,
<GeorgeXue@...oxin.com>, <LeoLiu-oc@...oxin.com>, <HansHu@...oxin.com>,
AlanSong-oc <AlanSong-oc@...oxin.com>
Subject: [PATCH v3 0/3] lib/crypto: x86/sha: Add PHE Extensions support
This series adds support for PHE Extensions optimized SHA1 and SHA256
transform functions for Zhaoxin processors in lib/crypto, and disables
the padlock-sha driver on Zhaoxin platforms due to self-test failures.
The table below shows the benchmark results before and after this patch
series by using CRYPTO_LIB_BENCHMARK on Zhaoxin KX-7000 platform,
highlighting the achieved speedups.
+---------+-------------------------+--------------------------+
| | SHA1 | SHA256 |
+---------+--------+----------------+--------+-----------------+
| Len | Before | After | Before | After |
+---------+--------+----------------+--------+-----------------+
| 1* | 3** | 8 (2.67x) | 2 | 7 (3.50x) |
| 16 | 52 | 125 (2.40x) | 35 | 119 (3.40x) |
| 64 | 114 | 318 (2.79x) | 74 | 280 (3.78x) |
| 127 | 154 | 440 (2.86x) | 99 | 387 (3.91x) |
| 128 | 160 | 492 (3.08x) | 103 | 427 (4.15x) |
| 200 | 189 | 605 (3.20x) | 123 | 537 (4.37x) |
| 256 | 199 | 676 (3.40x) | 128 | 582 (4.55x) |
| 511 | 223 | 794 (3.56x) | 144 | 679 (4.72x) |
| 512 | 225 | 833 (3.70x) | 146 | 714 (4.89x) |
| 1024 | 243 | 941 (3.87x) | 157 | 796 (5.07x) |
| 3173 | 259 | 1044 (4.03x) | 167 | 883 (5.28x) |
| 4096 | 257 | 1044 (4.06x) | 166 | 876 (5.28x) |
| 16384 | 261 | 1073 (4.11x) | 169 | 899 (5.32x) |
+---------+--------+----------------+--------+-----------------+
*: The length of each data block to be processed by one complete SHA
sequence.
**: The throughput of processing data blocks, unit is Mb/s.
After applying this patch series, the KUnit test suites for SHA1 and
SHA256 pass successfully on Zhaoxin platforms. The following shows the
detailed test logs:
[ 5.993700] # Subtest: sha1
[ 5.996813] # module: sha1_kunit
[ 5.996814] 1..11
[ 6.003399] ok 1 test_hash_test_vectors
[ 6.012489] ok 2 test_hash_all_lens_up_to_4096
[ 6.028511] ok 3 test_hash_incremental_updates
[ 6.035766] ok 4 test_hash_buffer_overruns
[ 6.043445] ok 5 test_hash_overlaps
[ 6.050315] ok 6 test_hash_alignment_consistency
[ 6.054994] ok 7 test_hash_ctx_zeroization
[ 6.127778] ok 8 test_hash_interrupt_context_1
[ 6.774847] ok 9 test_hash_interrupt_context_2
[ 6.810745] ok 10 test_hmac
[ 6.835169] # benchmark_hash: len=1: 8 MB/s
[ 6.847167] # benchmark_hash: len=16: 125 MB/s
[ 6.862114] # benchmark_hash: len=64: 318 MB/s
[ 6.878173] # benchmark_hash: len=127: 440 MB/s
[ 6.893081] # benchmark_hash: len=128: 492 MB/s
[ 6.907976] # benchmark_hash: len=200: 605 MB/s
[ 6.922658] # benchmark_hash: len=256: 676 MB/s
[ 6.937558] # benchmark_hash: len=511: 794 MB/s
[ 6.951994] # benchmark_hash: len=512: 833 MB/s
[ 6.966262] # benchmark_hash: len=1024: 941 MB/s
[ 6.980295] # benchmark_hash: len=3173: 1044 MB/s
[ 6.994494] # benchmark_hash: len=4096: 1044 MB/s
[ 7.008728] # benchmark_hash: len=16384: 1073 MB/s
[ 7.014515] ok 11 benchmark_hash
[ 7.019628] # sha1: pass:11 fail:0 skip:0 total:11
[ 7.023170] # Totals: pass:11 fail:0 skip:0 total:11
[ 7.027916] ok 5 sha1
[ 7.767257] # Subtest: sha256
[ 7.770542] # module: sha256_kunit
[ 7.770544] 1..15
[ 7.777383] ok 1 test_hash_test_vectors
[ 7.788563] ok 2 test_hash_all_lens_up_to_4096
[ 7.806090] ok 3 test_hash_incremental_updates
[ 7.813553] ok 4 test_hash_buffer_overruns
[ 7.822384] ok 5 test_hash_overlaps
[ 7.829388] ok 6 test_hash_alignment_consistency
[ 7.833843] ok 7 test_hash_ctx_zeroization
[ 7.915191] ok 8 test_hash_interrupt_context_1
[ 8.362312] ok 9 test_hash_interrupt_context_2
[ 8.401607] ok 10 test_hmac
[ 8.415458] ok 11 test_sha256_finup_2x
[ 8.419397] ok 12 test_sha256_finup_2x_defaultctx
[ 8.424107] ok 13 test_sha256_finup_2x_hugelen
[ 8.451289] # benchmark_hash: len=1: 7 MB/s
[ 8.465372] # benchmark_hash: len=16: 119 MB/s
[ 8.481760] # benchmark_hash: len=64: 280 MB/s
[ 8.499344] # benchmark_hash: len=127: 387 MB/s
[ 8.515800] # benchmark_hash: len=128: 427 MB/s
[ 8.531970] # benchmark_hash: len=200: 537 MB/s
[ 8.548241] # benchmark_hash: len=256: 582 MB/s
[ 8.564838] # benchmark_hash: len=511: 679 MB/s
[ 8.580872] # benchmark_hash: len=512: 714 MB/s
[ 8.596858] # benchmark_hash: len=1024: 796 MB/s
[ 8.612567] # benchmark_hash: len=3173: 883 MB/s
[ 8.628546] # benchmark_hash: len=4096: 876 MB/s
[ 8.644482] # benchmark_hash: len=16384: 899 MB/s
[ 8.649773] ok 14 benchmark_hash
[ 8.655505] ok 15 benchmark_sha256_finup_2x # SKIP not relevant
[ 8.659065] # sha256: pass:14 fail:0 skip:1 total:15
[ 8.665276] # Totals: pass:14 fail:0 skip:1 total:15
[ 8.670195] ok 7 sha256
Changes in v3:
- Implement PHE Extensions optimized SHA1 and SHA256 transform functions
using inline assembly instead of separate assembly files
- Eliminate unnecessary casts
- Add CONFIG_CPU_SUP_ZHAOXIN check to compile out the code when disabled
- Use 'boot_cpu_data.x86' to identify the CPU family instead of
'cpu_data(0).x86'
- Only check X86_FEATURE_PHE_EN for CPU support, consistent with other
CPU feature checks.
- Disable the padlock-sha driver on Zhaoxin processors with CPU family
0x07 and newer.
Changes in v2:
- Add Zhaoxin support to lib/crypto instead of extending the existing
padlock-sha driver
AlanSong-oc (3):
crypto: padlock-sha - Disable for Zhaoxin processor
lib/crypto: x86/sha1: PHE Extensions optimized SHA1 transform function
lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform
function
drivers/crypto/padlock-sha.c | 7 +++++++
lib/crypto/x86/sha1.h | 25 +++++++++++++++++++++++++
lib/crypto/x86/sha256.h | 25 +++++++++++++++++++++++++
3 files changed, 57 insertions(+)
--
2.34.1
Powered by blists - more mailing lists