[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e70bf33-bab5-83a3-1eb0-7cae442c2f64@iki.fi>
Date: Mon, 20 Dec 2021 20:03:47 +0200
From: Jussi Kivilinna <jussi.kivilinna@....fi>
To: Tianjia Zhang <tianjia.zhang@...ux.alibaba.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>,
Vitaly Chikunov <vt@...linux.org>,
Eric Biggers <ebiggers@...gle.com>,
Eric Biggers <ebiggers@...nel.org>,
Gilad Ben-Yossef <gilad@...yossef.com>,
Ard Biesheuvel <ardb@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, linux-crypto@...r.kernel.org,
x86@...nel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/6] crypto: x86/sm3 - add AVX assembly implementation
On 20.12.2021 10.22, Tianjia Zhang wrote:
> This patch adds AVX assembly accelerated implementation of SM3 secure
> hash algorithm. From the benchmark data, compared to pure software
> implementation sm3-generic, the performance increase is up to 38%.
>
> The main algorithm implementation based on SM3 AES/BMI2 accelerated
> work by libgcrypt at:
> https://gnupg.org/software/libgcrypt/index.html
>
> Benchmark on Intel i5-6200U 2.30GHz, performance data of two
> implementations, pure software sm3-generic and sm3-avx acceleration.
> The data comes from the 326 mode and 422 mode of tcrypt. The abscissas
> are different lengths of per update. The data is tabulated and the
> unit is Mb/s:
>
> update-size | 16 64 256 1024 2048 4096 8192
> --------------------------------------------------------------------
> sm3-generic | 105.97 129.60 182.12 189.62 188.06 193.66 194.88
> sm3-avx | 119.87 163.05 244.44 260.92 257.60 264.87 265.88
>
> Signed-off-by: Tianjia Zhang <tianjia.zhang@...ux.alibaba.com>
> ---
> arch/x86/crypto/Makefile | 3 +
> arch/x86/crypto/sm3-avx-asm_64.S | 521 +++++++++++++++++++++++++++++++
> arch/x86/crypto/sm3_avx_glue.c | 134 ++++++++
> crypto/Kconfig | 13 +
> 4 files changed, 671 insertions(+)
> create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S
> create mode 100644 arch/x86/crypto/sm3_avx_glue.c
>
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index f307c93fc90a..7cbe860f6201 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -88,6 +88,9 @@ nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o
>
> obj-$(CONFIG_CRYPTO_CURVE25519_X86) += curve25519-x86_64.o
>
> +obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o
> +sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o
> +
> obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o
> sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o
>
> diff --git a/arch/x86/crypto/sm3-avx-asm_64.S b/arch/x86/crypto/sm3-avx-asm_64.S
> new file mode 100644
> index 000000000000..e7a9a37f3609
> --- /dev/null
> +++ b/arch/x86/crypto/sm3-avx-asm_64.S
> @@ -0,0 +1,521 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * SM3 AVX accelerated transform.
> + * specified in: https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
> + *
> + * Copyright (C) 2021 Jussi Kivilinna <jussi.kivilinna@....fi>
> + * Copyright (C) 2021 Tianjia Zhang <tianjia.zhang@...ux.alibaba.com>
> + */
<snip>
> +
> +#define R(i, a, b, c, d, e, f, g, h, round, widx, wtype) \
> + /* rol(a, 12) => t0 */ \
> + roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on zen3 */ \
> + /* rol (t0 + e + t), 7) => t1 */ \
> + addl3(t0, e, t1); \
> + addl $K##round, t1; \
It's better to use "leal K##round(t0, e, 1), t1;" here and fix K0-K63 macros
instead as I noted at libgcrypt mailing-list:
https://lists.gnupg.org/pipermail/gcrypt-devel/2021-December/005209.html
-Jussi
Powered by blists - more mailing lists