linux-kernel - Re: [PATCH 16/16] crypto: arm64/sm4 - add ARMv9 SVE cryptography acceleration implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cb061cbb-5f28-9dde-270e-6d7ccb6d4433@linux.alibaba.com>
Date:   Tue, 27 Sep 2022 12:26:29 +0800
From:   Tianjia Zhang <tianjia.zhang@...ux.alibaba.com>
To:     Ard Biesheuvel <ardb@...nel.org>, Mark Brown <broonie@...nel.org>
Cc:     Herbert Xu <herbert@...dor.apana.org.au>,
        "David S. Miller" <davem@...emloft.net>,
        Jussi Kivilinna <jussi.kivilinna@....fi>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Maxime Coquelin <mcoquelin.stm32@...il.com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Eric Biggers <ebiggers@...nel.org>,
        linux-crypto@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org,
        linux-stm32@...md-mailman.stormreply.com
Subject: Re: [PATCH 16/16] crypto: arm64/sm4 - add ARMv9 SVE cryptography
 acceleration implementation

Hi Ard,

On 9/26/22 6:02 PM, Ard Biesheuvel wrote:
> (cc Mark Brown)
> 
> Hello Tianjia,
> 
> On Mon, 26 Sept 2022 at 11:37, Tianjia Zhang
> <tianjia.zhang@...ux.alibaba.com> wrote:
>>
>> Scalable Vector Extension (SVE) is the next-generation SIMD extension for
>> arm64. SVE allows flexible vector length implementations with a range of
>> possible values in CPU implementations. The vector length can vary from a
>> minimum of 128 bits up to a maximum of 2048 bits, at 128-bit increments.
>> The SVE design guarantees that the same application can run on different
>> implementations that support SVE, without the need to recompile the code.
>>
>> SVE was originally introduced by ARMv8, and ARMv9 introduced SVE2 to
>> expand and improve it. Similar to the Crypto Extension supported by the
>> NEON instruction set for the algorithm, SVE also supports the similar
>> instructions, called cryptography acceleration instructions, but this is
>> also optional instruction set.
>>
>> This patch uses SM4 cryptography acceleration instructions and SVE2
>> instructions to optimize the SM4 algorithm for ECB/CBC/CFB/CTR modes.
>> Since the encryption of CBC/CFB cannot be parallelized, the Crypto
>> Extension instruction is used.
>>
> 
> Given that we currently do not support the use of SVE in kernel mode,
> this patch cannot be accepted at this time (but the rest of the series
> looks reasonable to me, although I have only skimmed over the patches)
> 
> In view of the disappointing benchmark results below, I don't think
> this is worth the hassle at the moment. If we can find a case where
> using SVE in kernel mode truly makes a [favorable] difference, we can
> revisit this, but not without a thorough analysis of the impact it
> will have to support SVE in the kernel. Also, the fact that SVE may
> also cover cryptographic extensions does not necessarily imply that a
> micro-architecture will perform those crypto transformations in
> parallel and so the performance may be the same even if VL > 128.
> 
> In summary, please drop this patch for now, and once there are more
> encouraging performance numbers, please resubmit it as part of a
> series that explicitly enables SVE in kernel mode on arm64, and
> documents the requirements and constraints.
> 
> I have cc'ed Mark who has been working on the SVE support., who might
> have something to add here as well.
> 
> Thanks,
> Ard.
> 
> 

Thanks for your reply, the current performance of SVE is really
unsatisfactory. One reason is that the optimization of SVE needs to deal
with more and more complex data shifting operations, such as in CBC/CFB
mode, but also in CTR mode. needing more instruction to complete the
128-bit count increment, and the use of CE optimization does not have
these complications.

In addition, I naively thought that when the VL is 256-bit, the
performance will simply double compared to 128-bit. At present, this is
not the case. Maybe it is worth using SVE until there are significantly
improved performance data. I'll follow your advice and drop this
patch.

Best regards,
Tianjia