lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 31 May 2010 19:22:21 +0200
From:	Milan Broz <mbroz@...hat.com>
To:	device-mapper development <dm-devel@...hat.com>
CC:	Andi Kleen <andi@...stfloor.org>, herbert@...dor.hengli.com.au,
	linux-kernel@...r.kernel.org, agk@...hat.com, ak@...ux.intel.com
Subject: Re: [dm-devel] [PATCH] DM-CRYPT: Scale to multiple CPUs

On 05/31/2010 06:04 PM, Andi Kleen wrote:
> DM-CRYPT: Scale to multiple CPUs
> 
> Currently dm-crypt does all encryption work per dmcrypt mapping in a
> single workqueue. This does not scale well when multiple CPUs
> are submitting IO at a high rate. The single CPU running the single
> thread cannot keep up with the encryption and encrypted IO performance
> tanks.

This is true only if encryption run on the CPU synchronously.

(Usually it is high speed SSD or dm-crypt above striped RAID where
underlying device throughput is higher than CPU encryption speed.)

I did a lot of experiments with similar design and abandoned it.
(If we go this way, there should be some parameter limiting
used # cpu threads for encryption, I had this configurable
through dm messages online + initial kernel module parameter.)

But I see two main problems:

1) How this scale together with asynchronous
crypto which run in parallel in crypto API layer (and have limited
resources)? (AES-NI for example)

2) Per volume threads and mempools were added to solve low memory
problems (exhausted mempools), isn't now possible deadlock here again?

(Like one CPU, many dm-crypt volumes - thread waiting for allocating
page from exhausted mempool, blocking another request (another volume)
in queue later which will free some pages after crypt processing.
This cannot happen with per volume threads. Or am I missing something here?)


Anyway, I still think that proper solution to this problem is run
parallel requests in cryptoAPI using async crypt interface,
IOW paralelize this on cryptoAPI layer which know best which resources
it can use for crypto work.

(Herbert - is something like per cpu crypto threads planned
for use in this case?)

Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists