lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed,  1 Aug 2018 18:51:40 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, Eric Biggers <ebiggers@...gle.com>,
        Theodore Tso <tytso@....edu>,
        Sasha Levin <alexander.levin@...rosoft.com>
Subject: [PATCH 4.9 073/144] fscrypt: use unbound workqueue for decryption

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Biggers <ebiggers@...gle.com>

[ Upstream commit 36dd26e0c8d42699eeba87431246c07c28075bae ]

Improve fscrypt read performance by switching the decryption workqueue
from bound to unbound.  With the bound workqueue, when multiple bios
completed on the same CPU, they were decrypted on that same CPU.  But
with the unbound queue, they are now decrypted in parallel on any CPU.

Although fscrypt read performance can be tough to measure due to the
many sources of variation, this change is most beneficial when
decryption is slow, e.g. on CPUs without AES instructions.  For example,
I timed tarring up encrypted directories on f2fs.  On x86 with AES-NI
instructions disabled, the unbound workqueue improved performance by
about 25-35%, using 1 to NUM_CPUs jobs with 4 or 8 CPUs available.  But
with AES-NI enabled, performance was unchanged to within ~2%.

I also did the same test on a quad-core ARM CPU using xts-speck128-neon
encryption.  There performance was usually about 10% better with the
unbound workqueue, bringing it closer to the unencrypted speed.

The unbound workqueue may be worse in some cases due to worse locality,
but I think it's still the better default.  dm-crypt uses an unbound
workqueue by default too, so this change makes fscrypt match.

Signed-off-by: Eric Biggers <ebiggers@...gle.com>
Signed-off-by: Theodore Ts'o <tytso@....edu>
Signed-off-by: Sasha Levin <alexander.levin@...rosoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
---
 fs/crypto/crypto.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -517,8 +517,17 @@ EXPORT_SYMBOL(fscrypt_initialize);
  */
 static int __init fscrypt_init(void)
 {
+	/*
+	 * Use an unbound workqueue to allow bios to be decrypted in parallel
+	 * even when they happen to complete on the same CPU.  This sacrifices
+	 * locality, but it's worthwhile since decryption is CPU-intensive.
+	 *
+	 * Also use a high-priority workqueue to prioritize decryption work,
+	 * which blocks reads from completing, over regular application tasks.
+	 */
 	fscrypt_read_workqueue = alloc_workqueue("fscrypt_read_queue",
-							WQ_HIGHPRI, 0);
+						 WQ_UNBOUND | WQ_HIGHPRI,
+						 num_online_cpus());
 	if (!fscrypt_read_workqueue)
 		goto fail;
 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ