lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 28 Jan 2016 13:42:45 -0800
From:	Andi Kleen <andi@...stfloor.org>
To:	akpm@...ux-foundation.org
Cc:	linux-kernel@...r.kernel.org, davidlohr.bueso@...com,
	rafael.j.wysocki@...el.com, lenb@...nel.org,
	Andi Kleen <ak@...ux.intel.com>
Subject: [PATCH] Optimize int_sqrt for small values for faster idle

From: Andi Kleen <ak@...ux.intel.com>

The menu cpuidle governor does at least two int_sqrt() each time
we go into idle in get_typical_interval to compute stddev

int_sqrts take 100-120 cycles each. Short idle latency is important
for many workloads.

I instrumented the function on my workstation and most values are
16bit only and most others 32bit (50% percentile is 122094,
75% is 3699533).

sqrt is implemented by starting with an initial estimation,
and then iterating. int_sqrt currently only uses a fixed
estimating which is good for 64bits worth of input.

This patch adds some checks at the beginning to start with
a better estimate for values fitting in 8, 16bit and 32bit.
This makes int_sqrt between 60+% faster for values in 16bit,
and still somewhat faster (between 10 and 30%) for larger values
upto 32bit. Full 64bit is slightly slower.

This optimizes the short idle calls and does not hurt the
long sleep (which probably do not care) much.

An alternative would be a full table drive approach, or
trying some inverted sqrt optimization, but this simple change
already seems to have a good payoff.

Signed-off-by: Andi Kleen <ak@...ux.intel.com>
---
 lib/int_sqrt.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c
index 1ef4cc3..2479ccf 100644
--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -21,7 +21,15 @@ unsigned long int_sqrt(unsigned long x)
 	if (x <= 1)
 		return x;
 
-	m = 1UL << (BITS_PER_LONG - 2);
+	if (x <= 0xffff) {
+		if (m <= 0xff)
+			m = 1UL << (8 - 2);
+		else
+			m = 1UL << (16 - 2);
+	} else if (x <= 0xffffffff)
+		m = 1UL << (32 - 2);
+	else
+		m = 1UL << (BITS_PER_LONG - 2);
 	while (m != 0) {
 		b = y + m;
 		y >>= 1;
-- 
2.4.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ