linux-kernel - Re: [dm-devel] [RFC PATCH 0/1] dm-crypt excessive overhead

Open Source and information security mailing list archives

Message-ID: <CY4PR04MB3751EB316BFD5600AFAA6796E7950@CY4PR04MB3751.namprd04.prod.outlook.com>
Date:   Wed, 24 Jun 2020 04:54:18 +0000
From:   Damien Le Moal <Damien.LeMoal@....com>
To:     Mike Snitzer <snitzer@...hat.com>,
        Ignat Korchagin <ignat@...udflare.com>
CC:     "kernel-team@...udflare.com" <kernel-team@...udflare.com>,
        "dm-crypt@...ut.de" <dm-crypt@...ut.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dm-devel@...hat.com" <dm-devel@...hat.com>,
        Mikulas Patocka <mpatocka@...hat.com>,
        "agk@...hat.com" <agk@...hat.com>
Subject: Re: [dm-devel] [RFC PATCH 0/1] dm-crypt excessive overhead

On 2020/06/24 0:23, Mike Snitzer wrote:
> On Tue, Jun 23 2020 at 11:07am -0400,
> Ignat Korchagin <ignat@...udflare.com> wrote:
> 
>> Do you think it may be better to break it in two flags: one for read
>> path and one for write? So, depending on the needs and workflow these
>> could be enabled independently?
> 
> If there is a need to split, then sure.  But I think Damien had a hard
> requirement that writes had to be inlined but that reads didn't _need_
> to be for his dm-zoned usecase.  Damien may not yet have assessed the
> performance implications, of not have reads inlined, as much as you
> have.

We did do performance testing :)
The results are mixed and performance differences between inline vs workqueues
depend on the workload (IO size, IO queue depth and number of drives being used
mostly). In many cases, inlining everything does really improve performance as
Ignat reported.

In our testing, we used hard drives and so focused mostly on throughput rather
than command latency. The added workqueue context switch overhead and crypto
work latency compared to typical HDD IO times is small, and significant only if
the backend storage as short IO times.

In the case of HDDs, especially for large IO sizes, inlining crypto work does
not shine as it prevents an efficient use of CPU resources. This is especially
true with reads on a large system with many drives connected to a single HBA:
the softirq context decryption work does not lend itself well to using other
CPUs that did not receive the HBA IRQ signaling command completions. The test
results clearly show much higher throughputs using dm-crypt as is.

On the other hand, inlining crypto work significantly improves workloads of
small random IOs, even for a large number of disks: removing the overhead of
context switches allows faster completions, allowing sending more requests to
the drives more quickly, keeping them busy.

For SMR, the inlining of write requests is *mandatory* to preserve the issuer
write sequence, but encryption work being done in the issuer context (writes to
SMR drives can only be O_DIRECT writes), efficient CPU resource usage can be
achieved by simply using multiple writer thread/processes, working on different
zones of different disks. This is a very reasonable model for SMR as writes into
a single zone have to be done under mutual exclusion to ensure sequentiality.

For reads, SMR drives are essentially exactly the same as regular disks, so
as-is or inline are both OK. Based on our performance results, allowing the user
to have the choice of inlining or not reads based on the target workload would
be great.

Of note is that zone append writes (emulated in SCSI, native with NVMe) are not
subject to the sequential write constraint, so they can also be executed either
inline or asynchronously.

> So let's see how Damien's work goes and if he trully doesn't need/want
> reads to be inlined then 2 flags can be created.

For SMR, I do not need inline reads, but I do want the user to have the
possibility of using this setup as that can provide better performance for some
workloads. I think that splitting the inline flag in 2 is exactly what we want:

1) For SMR, the write-inline flag can be automatically turned on when the target
device is created if the backend device used is a host-managed zoned drive (scsi
or NVMe ZNS). For reads, it would be the user choice, based on the target workload.
2) For regular block devices, write-inline only, read-inline only or both would
be the user choice, to optimize for their target workload.

With the split into 2 flags, my SMR support patch becomes very simple.

> 
> Mike
> 
> --
> dm-devel mailing list
> dm-devel@...hat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 

-- 
Damien Le Moal
Western Digital Research

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives