lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 30 Sep 2021 16:57:09 +0200
From:   Ard Biesheuvel <ardb@...nel.org>
To:     Xiaokang Qian <Xiaokang.Qian@....com>
Cc:     Eric Biggers <ebiggers@...nel.org>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "David S. Miller" <davem@...emloft.net>,
        Catalin Marinas <Catalin.Marinas@....com>,
        Will Deacon <will@...nel.org>, nd <nd@....com>,
        Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way interleave
 of aes and ghash

On Thu, 30 Sept 2021 at 03:32, Xiaokang Qian <Xiaokang.Qian@....com> wrote:
>
> Thanks for the review.
>
> I will firstly change the decrypt path to compare the tag using SIMD code, and then  pass all of the self tests include fuzz tests(enabled by CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y), big endian ,little endian tests.
>

OK

> About the 1K data point, I just remember that the 1420 bytes packet is commonly used in IPSEC.
>

Yes, but your code is faster than the existing code for 1420 byte
packets, right? So why should we keep the original code? We don't use
GCM for block storage, and if IPsec throughput is a key performance
metric for your system, you are likely to be using the maximum packet
size so 1420 bytes not 1k.


>
> -----Original Message-----
> From: Ard Biesheuvel <ardb@...nel.org>
> Sent: Wednesday, September 29, 2021 5:04 AM
> To: Eric Biggers <ebiggers@...nel.org>
> Cc: Xiaokang Qian <Xiaokang.Qian@....com>; Herbert Xu <herbert@...dor.apana.org.au>; David S. Miller <davem@...emloft.net>; Catalin Marinas <Catalin.Marinas@....com>; Will Deacon <will@...nel.org>; nd <nd@....com>; Linux Crypto Mailing List <linux-crypto@...r.kernel.org>; Linux ARM <linux-arm-kernel@...ts.infradead.org>; Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
> Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way interleave of aes and ghash
>
> On Tue, 28 Sept 2021 at 08:27, Eric Biggers <ebiggers@...nel.org> wrote:
> >
> > On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote:
> > > To improve performance on cores with deep piplines such as A72,N1,
> > > implement gcm(aes) using a 4-way interleave of aes and ghash
> > > (totally
> > > 8 blocks in parallel), which can make full utilize of pipelines
> > > rather than the 4-way interleave we used currently. It can gain
> > > about 20% for big data sizes such that 8k.
> > >
> > > This is a complete new version of the GCM part of the combined
> > > GCM/GHASH driver, it will co-exist with the old driver, only serve
> > > for big data sizes. Instead of interleaving four invocations of AES
> > > where each chunk of 64 bytes is encrypted first and then ghashed,
> > > the new version uses a more coarse grained approach where a chunk of
> > > 64 bytes is encrypted and at the same time, one chunk of 64 bytes is
> > > ghashed (or ghashed and decrypted in the converse case).
> > >
> > > The table below compares the performance of the old driver and the
> > > new one on various micro-architectures and running in various modes
> > > with various data sizes.
> > >
> > >             |     AES-128       |     AES-192       |     AES-256       |
> > >      #bytes | 1024 | 1420 |  8k | 1024 | 1420 |  8k | 1024 | 1420 |  8k |
> > >      -------+------+------+-----+------+------+-----+------+------+-----+
> > >         A72 | 5.5% |  12% | 25% | 2.2% |  9.5%|  23%| -1%  |  6.7%| 19% |
> > >         A57 |-0.5% |  9.3%| 32% | -3%  |  6.3%|  26%| -6%  |  3.3%| 21% |
> > >         N1  | 0.4% |  7.6%|24.5%| -2%  |  5%  |  22%| -4%  |  2.7%|
> > > 20% |
> > >
> > > Signed-off-by: XiaokangQian <xiaokang.qian@....com>
> >
> > Does this pass the self-tests, including the fuzz tests which are
> > enabled by CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?
> >
>
> Please test both little-endian and big-endian. (Note that you don't need a big-endian user space for this - the self tests are executed before the rootfs is mounted)
>
> Also, you will have to rebase this onto the latest cryptodev tree, which carries some changes I made recently to this driver.
>
> Finally, I'd like to discuss whether we really need two separate drivers here. The 1k data point is not as relevant as the other ones, which show a worthwhile speedup for all micro architectures and data sizes (although I will give this a spin on TX2 myself when I have the
> chance)
>
> *If* we switch to this implementation completely, I would like to keep the improvement I added recently to the decrypt path to compare the tag using SIMD code, rather than copying it out and using memcmp().
> Could you look into adopting this for this version as well?
>
> --
> Ard.

Powered by blists - more mailing lists