lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <63d913f28bc64bd4ea66a39a532f0b59ee015382.1498039056.git.pabeni@redhat.com>
Date:   Wed, 21 Jun 2017 13:09:51 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     x86@...nel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Kees Cook <keescook@...omium.org>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        linux-kernel@...r.kernel.org
Subject: [PATCH] x86/uaccess: use unrolled string copy for short strings

The 'rep' prefix suffers for a relevant "setup cost"; as a result
string copies with unrolled loops are faster than even
optimized string copy using 'rep' variant, for short string.

This change updates __copy_user_generic() to use the unrolled
version for small string length. The threshold length for short
string - 64 - has been selected with empirical measures as the
larger value that still ensure a measurable gain.

A micro-benchmark of __copy_from_user() with different lengths shows
the following:

string len	vanilla		patched 	delta
bytes		ticks		ticks		tick(%)

0		58		26		32(55%)
1		49		29		20(40%)
2		49		31		18(36%)
3		49		32		17(34%)
4		50		34		16(32%)
5		49		35		14(28%)
6		49		36		13(26%)
7		49		38		11(22%)
8		50		31		19(38%)
9		51		33		18(35%)
10		52		36		16(30%)
11		52		37		15(28%)
12		52		38		14(26%)
13		52		40		12(23%)
14		52		41		11(21%)
15		52		42		10(19%)
16		51		34		17(33%)
17		51		35		16(31%)
18		52		37		15(28%)
19		51		38		13(25%)
20		52		39		13(25%)
21		52		40		12(23%)
22		51		42		9(17%)
23		51		46		5(9%)
24		52		35		17(32%)
25		52		37		15(28%)
26		52		38		14(26%)
27		52		39		13(25%)
28		52		40		12(23%)
29		53		42		11(20%)
30		52		43		9(17%)
31		52		44		8(15%)
32		51		36		15(29%)
33		51		38		13(25%)
34		51		39		12(23%)
35		51		41		10(19%)
36		52		41		11(21%)
37		52		43		9(17%)
38		51		44		7(13%)
39		52		46		6(11%)
40		51		37		14(27%)
41		50		38		12(24%)
42		50		39		11(22%)
43		50		40		10(20%)
44		50		42		8(16%)
45		50		43		7(14%)
46		50		43		7(14%)
47		50		45		5(10%)
48		50		37		13(26%)
49		49		38		11(22%)
50		50		40		10(20%)
51		50		42		8(16%)
52		50		42		8(16%)
53		49		46		3(6%)
54		50		46		4(8%)
55		49		48		1(2%)
56		50		39		11(22%)
57		50		40		10(20%)
58		49		42		7(14%)
59		50		42		8(16%)
60		50		46		4(8%)
61		50		47		3(6%)
62		50		48		2(4%)
63		50		48		2(4%)
64		51		38		13(25%)

Above 64 bytes the gain fades away.

Very similar values are collectd for __copy_to_user().
UDP receive performances under flood with small packets using recvfrom()
increase by ~5%.

Signed-off-by: Paolo Abeni <pabeni@...hat.com>
---
 arch/x86/include/asm/uaccess_64.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c5504b9..16a8871 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
 {
 	unsigned ret;
 
+	if (len <= 64)
+		return copy_user_generic_unrolled(to, from, len);
+
 	/*
 	 * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
 	 * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
-- 
2.9.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ