lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 26 Oct 2015 16:22:27 +0100
From:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:	Ingo Molnar <mingo@...nel.org>, Davidlohr Bueso <dave@...olabs.net>
Cc:	Rasmus Villemoes <linux@...musvillemoes.dk>,
	Thomas Gleixner <tglx@...utronix.de>,
	kbuild test robot <fengguang.wu@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] futex: eliminate cache miss from futex_hash()

On 09/12/2015 11:59 AM, Ingo Molnar wrote:
> 
> * Davidlohr Bueso <dave@...olabs.net> wrote:
> 
>> I think we should leave it as is.
> 
> But ... given that these are shared-cached values (cached on all CPUs), this 
> change would only be measurable in such a benchmark if the cache footprint of the 
> test is just about to overflow the size of the CPU cache and the one extra cache 
> line would cause cache trashing. That is very unlikely.
> 
> So such a change seems to make sense unless you can argue that it's _bad_ to move 
> them closer to each other.

hash_futex(), ARM, gcc-5.2.1:
- three opcodes less
- we don't push / pop a register to the stack

--- futex_old.o_f.S
+++ futex_new.o_f.S
@@ -1,26 +1,23 @@
 00000000 <hash_futex>:
-push   {lr}            ; (str lr, [sp, #-4]!)
-movw   r3, #48887      ; 0xbef7
 ldr    r1, [r0, #8]
-movt   r3, #57005      ; 0xdead
+movw   r3, #48887      ; 0xbef7
 ldr    r2, [r0, #4]
-movw   ip, #0
+movt   r3, #57005      ; 0xdead
 add    r3, r1, r3
 ldr    r0, [r0]
 add    r2, r3, r2
-movt   ip, #0
+movw   ip, #0
 eor    r1, r3, r2
 add    r3, r3, r0
 sub    r1, r1, r2, ror #18
-ldr    ip, [ip]
+movt   ip, #0
 eor    r3, r3, r1
-movw   lr, #0
+ldr    r0, [ip, #4]
 sub    r3, r3, r1, ror #21
-sub    ip, ip, #1
+ldr    ip, [ip]
 eor    r2, r2, r3
-movt   lr, #0
+sub    r0, r0, #1
 sub    r2, r2, r3, ror #7
-ldr    r0, [lr]
 eor    r1, r1, r2
 sub    r1, r1, r2, ror #16
 eor    r3, r3, r1
@@ -29,6 +26,6 @@
 sub    r3, r2, r3, ror #18
 eor    r1, r1, r3
 sub    r3, r1, r3, ror #8
-and    r3, r3, ip
-add    r0, r0, r3, lsl #6
-pop    {pc}            ; (ldr pc, [sp], #4)
+and    r0, r0, r3
+add    r0, ip, r0, lsl #6
+bx     lr

I guess that not invoking three opcodes is a good thing :)

> Thanks,
> 
> 	Ingo
> 

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ