linux-kernel - Re: [PATCH 0/5] crypto: add IV generation templates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKv+Gu-hvEUyOOh_xpYsSj2WTcbM1FwRnJ1DkTGH1UjxkppNMQ@mail.gmail.com>
Date:   Thu, 19 Jul 2018 23:08:52 +0900
From:   Ard Biesheuvel <ard.biesheuvel@...aro.org>
To:     Xiongfeng Wang <wangxiongfeng2@...wei.com>
Cc:     Arnd Bergmann <arnd@...db.de>, Alasdair Kergon <agk@...hat.com>,
        Mike Snitzer <snitzer@...hat.com>,
        Herbert Xu <herbert@...dor.apana.org.au>, dm-devel@...hat.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mark Brown <broonie@...nel.org>,
        Jonathan Cameron <jonathan.cameron@...wei.com>
Subject: Re: [PATCH 0/5] crypto: add IV generation templates

On 19 July 2018 at 19:55, Xiongfeng Wang <wangxiongfeng2@...wei.com> wrote:
> Hi,
>
> On 2018/7/18 23:34, Ard Biesheuvel wrote:
>> On 18 July 2018 at 19:59, Arnd Bergmann <arnd@...db.de> wrote:
>>> On Wed, Jul 18, 2018 at 9:30 AM, Xiongfeng Wang
>>> <wangxiongfeng2@...wei.com> wrote:
>>>>
>>>> I tested the performance of software implemented ciphers before and after
>>>> applying this patchset. The performance didn't change much except for
>>>> slight regression when writting. The detail information is as follows.
>>>>
>>>> The command I used:
>>>> cryptsetup -y -c aes-xts-plain -s 256 --hash sha256 luksFormat /dev/sdd1
>>>> cryptsetup -y -c aes-cbc-essiv:sha256 -s 256 --hash sha256 luksFormat /dev/sdd1
>>>> cryptsetup -y -c aes-cbc-benbi -s 256 --hash sha256 luksFormat /dev/sdd1
>>>>
>>>> cryptsetup luksOpen /dev/sdd1 crypt_fun
>>>> time dd if=/dev/mapper/crypt_fun of=/dev/null bs=1M count=500 iflag=direct
>>>> time dd if=/dev/zero of=/dev/mapper/crypt_fun bs=1M count=500 oflag=direct
>>>>
>>>> Performance comparision:
>>>> --------------------------------------------------------
>>>> algorithms      | before applying   |   after applying
>>>> --------------------------------------------------------
>>>>                 |  read  | write    |  read  | write
>>>> --------------------------------------------------------
>>>> aes-xts-plain   | 145.34 | 145.09   | 145.89 | 144.2
>>>> --------------------------------------------------------
>>>> aes-cbc-essiv   | 146.87 | 144.62   | 146.74 | 143.41
>>>> --------------------------------------------------------
>>>> aes-cbc-benbi   | 146.03 | 144.74   | 146.77 | 144.46
>>>> --------------------------------------------------------
>>>
>>> Do you have any estimate of the expected gains for hardware
>>> implementations?
>>>
>>> Would it make sense to try out implementing aes-cbc-essiv
>>> on the ARMv8 crypto extensions? I see that Ard has done
>>> some prior work on aes-ccm in arch/arm64/crypto/aes-ce-ccm-*
>>> that (AFAICT) has a similar goal of avoiding overhead by
>>> combining the usual operations, so maybe the same can
>>> be done here.
>>>
>>
>> I am having trouble understanding what exactly this series aims to achieve.
>>
>> Calling into the crypto layer fewer times is a nice goal, but a disk
>> sector seems like a reasonable granularity for the dm layer to operate
>> on, and I don't think any hardware exists that operates on multi
>> sector sequences, where it would pay off to amortize the latency of
>> invoking the hardware over an entire bio.
>
> I don't know much about crypto hardware, but I think a crypto hardware can handle
> data more than one sector at one time. So I think passing the whole bio to the hardware
> at one time will decrease the overhead in passing each sector alternatively.
>

But this will only be the case if the accelerator is capable of doing
the IV generation and en/decryption of multiple contiguous sectors in
a single call. Otherwise, you are just shifting work from one layer to
the next.

So at this point, it would be useful to clarify what exactly these
accelerators are doing and how.