linux-kernel - Re: [dm-crypt] dm-crypt: Performance Regression 2.6.37 -> 2.6.38-rc8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110308192341.GA8356@darkside.kls.lan>
Date:	Tue, 8 Mar 2011 20:23:42 +0100
From:	"Mario 'BitKoenig' Holbe" <Mario.Holbe@...Ilmenau.DE>
To:	Milan Broz <mbroz@...hat.com>
Cc:	dm-crypt@...ut.de, linux-kernel@...r.kernel.org,
	Andi Kleen <ak@...ux.intel.com>,
	Alasdair G Kergon <agk@...hat.com>
Subject: Re: [dm-crypt] dm-crypt: Performance Regression 2.6.37 ->
 2.6.38-rc8

On Tue, Mar 08, 2011 at 06:35:01PM +0100, Milan Broz wrote:
> On 03/08/2011 05:45 PM, Mario 'BitKoenig' Holbe wrote:
> > dm-crypt in 2.6.38 changed to per-CPU workqueues to increase it's
> > performance by parallelizing encryption to multiple CPUs.
> > This modification seems to cause (massive) performance drops for
> > multiple parallel dm-crypt instances...
> Well, it depends. I never suggested this kind of workaround because
> you basically hardcoded (in device stacking) how many parallel instances
> (==cpu cores ideally) of dmcrypt can run effectively.

Yes. But it was the best to get :)

> With current design the IO is encrypted by the cpu which submitted it.
...
> If you use one dmcrypt instance over RAID0, you will now get probably
> much more better throughput. (Even with one process generating IOs
> the bios are, surprisingly, submitted on different cpus. But this time
> it runs really in parallel.)

Mh, not really. I just tested this with kernels fresh booted into
emergency and udev started to create device nodes:

# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo1 /dev/sdc
...
# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo4 /dev/sdf
# mdadm -B -l raid0 -n 4 -c 256 /dev/md/foo /dev/mapper/foo[1-4]
# dd if=/dev/md/foo of=/dev/null bs=1M count=20k

2.6.37: 291MB/s		2.6.38: 139MB/s

# mdadm -B -l raid0 -n 4 -c 256 /dev/md/foo /dev/sd[c-f]
# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo /dev/md/foo
# dd if=/dev/mapper/foo of=/dev/null bs=1M count=20k

2.6.37: 126MB/s		2.6.38: 138MB/s

So... performance drops on .37 (as expected) and nothing changes on .38
(unlike expected).

Those results, btw., differ dramatically when using tmpfs-backed
loop-devices instead of hard disks:

raid0 over crypted loops:
2.6.37: 285MB/s		2.6.38: 324MB/s
crypted raid0 over loops:
2.6.37: 119MB/s		2.6.38: 225MB/s

Here we have indeed changing results - even if they are not what one
would expect.

All those constructs are read-only and hence can be tested on any
somewhat available block device. Setting devices read-only would
probably be a good idea to compensate being short on sleep or whatever.

> Maybe we can find some compromise but I basically prefer current design,
> which provides much more better behaviour for most of configurations.

Hmmm...


regards
   Mario
-- 
File names are infinite in length where infinity is set to 255 characters.
                                -- Peter Collinson, "The Unix File System"

Download attachment "signature.asc" of type "application/pgp-signature" (483 bytes)