lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu-hvEUyOOh_xpYsSj2WTcbM1FwRnJ1DkTGH1UjxkppNMQ@mail.gmail.com>
Date:   Thu, 19 Jul 2018 23:08:52 +0900
From:   Ard Biesheuvel <ard.biesheuvel@...aro.org>
To:     Xiongfeng Wang <wangxiongfeng2@...wei.com>
Cc:     Arnd Bergmann <arnd@...db.de>, Alasdair Kergon <agk@...hat.com>,
        Mike Snitzer <snitzer@...hat.com>,
        Herbert Xu <herbert@...dor.apana.org.au>, dm-devel@...hat.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mark Brown <broonie@...nel.org>,
        Jonathan Cameron <jonathan.cameron@...wei.com>
Subject: Re: [PATCH 0/5] crypto: add IV generation templates

On 19 July 2018 at 19:55, Xiongfeng Wang <wangxiongfeng2@...wei.com> wrote:
> Hi,
>
> On 2018/7/18 23:34, Ard Biesheuvel wrote:
>> On 18 July 2018 at 19:59, Arnd Bergmann <arnd@...db.de> wrote:
>>> On Wed, Jul 18, 2018 at 9:30 AM, Xiongfeng Wang
>>> <wangxiongfeng2@...wei.com> wrote:
>>>>
>>>> I tested the performance of software implemented ciphers before and after
>>>> applying this patchset. The performance didn't change much except for
>>>> slight regression when writting. The detail information is as follows.
>>>>
>>>> The command I used:
>>>> cryptsetup -y -c aes-xts-plain -s 256 --hash sha256 luksFormat /dev/sdd1
>>>> cryptsetup -y -c aes-cbc-essiv:sha256 -s 256 --hash sha256 luksFormat /dev/sdd1
>>>> cryptsetup -y -c aes-cbc-benbi -s 256 --hash sha256 luksFormat /dev/sdd1
>>>>
>>>> cryptsetup luksOpen /dev/sdd1 crypt_fun
>>>> time dd if=/dev/mapper/crypt_fun of=/dev/null bs=1M count=500 iflag=direct
>>>> time dd if=/dev/zero of=/dev/mapper/crypt_fun bs=1M count=500 oflag=direct
>>>>
>>>> Performance comparision:
>>>> --------------------------------------------------------
>>>> algorithms      | before applying   |   after applying
>>>> --------------------------------------------------------
>>>>                 |  read  | write    |  read  | write
>>>> --------------------------------------------------------
>>>> aes-xts-plain   | 145.34 | 145.09   | 145.89 | 144.2
>>>> --------------------------------------------------------
>>>> aes-cbc-essiv   | 146.87 | 144.62   | 146.74 | 143.41
>>>> --------------------------------------------------------
>>>> aes-cbc-benbi   | 146.03 | 144.74   | 146.77 | 144.46
>>>> --------------------------------------------------------
>>>
>>> Do you have any estimate of the expected gains for hardware
>>> implementations?
>>>
>>> Would it make sense to try out implementing aes-cbc-essiv
>>> on the ARMv8 crypto extensions? I see that Ard has done
>>> some prior work on aes-ccm in arch/arm64/crypto/aes-ce-ccm-*
>>> that (AFAICT) has a similar goal of avoiding overhead by
>>> combining the usual operations, so maybe the same can
>>> be done here.
>>>
>>
>> I am having trouble understanding what exactly this series aims to achieve.
>>
>> Calling into the crypto layer fewer times is a nice goal, but a disk
>> sector seems like a reasonable granularity for the dm layer to operate
>> on, and I don't think any hardware exists that operates on multi
>> sector sequences, where it would pay off to amortize the latency of
>> invoking the hardware over an entire bio.
>
> I don't know much about crypto hardware, but I think a crypto hardware can handle
> data more than one sector at one time. So I think passing the whole bio to the hardware
> at one time will decrease the overhead in passing each sector alternatively.
>

But this will only be the case if the accelerator is capable of doing
the IV generation and en/decryption of multiple contiguous sectors in
a single call. Otherwise, you are just shifting work from one layer to
the next.

So at this point, it would be useful to clarify what exactly these
accelerators are doing and how.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ