[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YMKf93/cnPGGtRW3@gmail.com>
Date: Thu, 10 Jun 2021 16:27:51 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Tianjia Zhang <tianjia.zhang@...ux.alibaba.com>
Cc: Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
Gilad Ben-Yossef <gilad@...yossef.com>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
"Markku-Juhani O . Saarinen" <mjos@....fi>,
Jussi Kivilinna <jussi.kivilinna@....fi>, x86@...nel.org,
linux-crypto@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler
implementation
On Thu, Jun 10, 2021 at 09:44:59PM +0800, Tianjia Zhang wrote:
> This patch adds AES-NI/AVX/x86_64 assembler implementation of SM4
> block cipher. Through two affine transforms, we can use the AES
> S-Box to simulate the SM4 S-Box to achieve the effect of instruction
> acceleration.
>
Benchmark results, please.
Also, is this passing the self-tests, including the fuzz tests?
> +/*
> + * void sm4_aesni_avx_expand_key(const u8 *key, u32 *rk_enc,
> + * u32 *rk_dec, const u32 *fk, const u32 *ck);
> + */
> +SYM_FUNC_START(sm4_aesni_avx_expand_key)
> + /* input:
> + * %rdi: 128-bit key
> + * %rsi: rkey_enc
> + * %rdx: rkey_dec
> + * %rcx: fk array
> + * %r8: ck array
> + */
> + FRAME_BEGIN
Key expansion isn't performance-critical. Can the C library version be used, or
does the key need to be expanded in a way specific to this x86 implementation?
> +/*
> + * void sm4_aesni_avx_crypt4(const u32 *rk, u8 *dst,
> + * const u8 *src, int nblocks)
> + */
> +SYM_FUNC_START(sm4_aesni_avx_crypt4)
> + /* input:
> + * %rdi: round key array, CTX
> + * %rsi: dst (1..4 blocks)
> + * %rdx: src (1..4 blocks)
> + * %rcx: num blocks (1..4)
> + */
> + FRAME_BEGIN
[...]
> +static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +{
> + const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> + if (crypto_simd_usable()) {
> + kernel_fpu_begin();
> + sm4_aesni_avx_crypt4(ctx->rkey_enc, out, in, 1);
> + kernel_fpu_end();
> + } else
> + crypto_sm4_do_crypt(ctx->rkey_enc, out, in);
> +}
> +
> +static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +{
> + const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> + if (crypto_simd_usable()) {
> + kernel_fpu_begin();
> + sm4_aesni_avx_crypt4(ctx->rkey_dec, out, in, 1);
> + kernel_fpu_end();
> + } else
> + crypto_sm4_do_crypt(ctx->rkey_dec, out, in);
> +}
Your assembly code appears to handle encrypting up to 4 blocks at a time.
However you have only wired this up to the "cipher" API which does 1 block at a
time. Is this intentional?
What are your performance results with real-world chaining modes like XTS, and
do you plan to implement any of these modes directly?
> +
> +static struct crypto_alg sm4_asm_alg = {
> + .cra_name = "sm4",
> + .cra_driver_name = "sm4-asm",
In arch/x86/crypto/, "-asm" usually means a vanilla x86 assembly implementation
without any AES-NI, SSE, AVX, etc. instructions. Calling this something like
"sm4-aesni-avx" would make more sense. (Or is it actually avx2, not avx?)
> +config CRYPTO_SM4_AESNI_AVX_X86_64
> + tristate "SM4 cipher algorithm (x86_64/AES-NI/AVX)"
> + depends on X86 && 64BIT
> + select CRYPTO_SKCIPHER
> + select CRYPTO_SIMD
> + select CRYPTO_ALGAPI
> + select CRYPTO_LIB_SM4
As-is, neither CRYPTO_SKCIPHER nor CRYPTO_SIMD needs to be selected here.
> + help
> + SM4 cipher algorithms (OSCCA GB/T 32907-2016) (x86_64/AES-NI/AVX).
> +
> + SM4 (GBT.32907-2016) is a cryptographic standard issued by the
> + Organization of State Commercial Administration of China (OSCCA)
> + as an authorized cryptographic algorithms for the use within China.
> +
> + SMS4 was originally created for use in protecting wireless
> + networks, and is mandated in the Chinese National Standard for
> + Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure)
> + (GB.15629.11-2003).
> +
> + The latest SM4 standard (GBT.32907-2016) was proposed by OSCCA and
> + standardized through TC 260 of the Standardization Administration
> + of the People's Republic of China (SAC).
> +
> + The input, output, and key of SMS4 are each 128 bits.
> +
> + See also: <https://eprint.iacr.org/2008/329.pdf>
> +
> + If unsure, say N.
This is the help text for the x86 implementation specifically. Please don't
have boilerplate text about the algorithm here; that already exists for the
generic implementation. The text should explain about the x86 implementation.
- Eric
Powered by blists - more mailing lists