[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110308192341.GA8356@darkside.kls.lan>
Date: Tue, 8 Mar 2011 20:23:42 +0100
From: "Mario 'BitKoenig' Holbe" <Mario.Holbe@...Ilmenau.DE>
To: Milan Broz <mbroz@...hat.com>
Cc: dm-crypt@...ut.de, linux-kernel@...r.kernel.org,
Andi Kleen <ak@...ux.intel.com>,
Alasdair G Kergon <agk@...hat.com>
Subject: Re: [dm-crypt] dm-crypt: Performance Regression 2.6.37 ->
2.6.38-rc8
On Tue, Mar 08, 2011 at 06:35:01PM +0100, Milan Broz wrote:
> On 03/08/2011 05:45 PM, Mario 'BitKoenig' Holbe wrote:
> > dm-crypt in 2.6.38 changed to per-CPU workqueues to increase it's
> > performance by parallelizing encryption to multiple CPUs.
> > This modification seems to cause (massive) performance drops for
> > multiple parallel dm-crypt instances...
> Well, it depends. I never suggested this kind of workaround because
> you basically hardcoded (in device stacking) how many parallel instances
> (==cpu cores ideally) of dmcrypt can run effectively.
Yes. But it was the best to get :)
> With current design the IO is encrypted by the cpu which submitted it.
...
> If you use one dmcrypt instance over RAID0, you will now get probably
> much more better throughput. (Even with one process generating IOs
> the bios are, surprisingly, submitted on different cpus. But this time
> it runs really in parallel.)
Mh, not really. I just tested this with kernels fresh booted into
emergency and udev started to create device nodes:
# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo1 /dev/sdc
...
# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo4 /dev/sdf
# mdadm -B -l raid0 -n 4 -c 256 /dev/md/foo /dev/mapper/foo[1-4]
# dd if=/dev/md/foo of=/dev/null bs=1M count=20k
2.6.37: 291MB/s 2.6.38: 139MB/s
# mdadm -B -l raid0 -n 4 -c 256 /dev/md/foo /dev/sd[c-f]
# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo /dev/md/foo
# dd if=/dev/mapper/foo of=/dev/null bs=1M count=20k
2.6.37: 126MB/s 2.6.38: 138MB/s
So... performance drops on .37 (as expected) and nothing changes on .38
(unlike expected).
Those results, btw., differ dramatically when using tmpfs-backed
loop-devices instead of hard disks:
raid0 over crypted loops:
2.6.37: 285MB/s 2.6.38: 324MB/s
crypted raid0 over loops:
2.6.37: 119MB/s 2.6.38: 225MB/s
Here we have indeed changing results - even if they are not what one
would expect.
All those constructs are read-only and hence can be tested on any
somewhat available block device. Setting devices read-only would
probably be a good idea to compensate being short on sleep or whatever.
> Maybe we can find some compromise but I basically prefer current design,
> which provides much more better behaviour for most of configurations.
Hmmm...
regards
Mario
--
File names are infinite in length where infinity is set to 255 characters.
-- Peter Collinson, "The Unix File System"
Download attachment "signature.asc" of type "application/pgp-signature" (483 bytes)
Powered by blists - more mailing lists