lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1001121708100.17145@localhost.localdomain>
Date:	Tue, 12 Jan 2010 17:24:45 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"H. Peter Anvin" <hpa@...or.com>
cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: x86: avoid read-cycle on down_read_trylock


We don't want to start the lock sequence with a plain read, since that 
will cause the cacheline to be initially brought in as a shared line, only 
to then immediately afterwards need to be turned into an exclusive one.

So in order to avoid unnecessary bus traffic, just start off assuming
that the lock is unlocked, which is the common case anyway.  That way,
the first access to the lock will be the actual locked cycle.

This speeds up the lock ping-pong case, since it now has fewer bus cycles.

The reason down_read_trylock() is so important is that the main rwsem 
usage is mmap_sem, and the page fault case - which is the most common case 
by far - takes it with a "down_read_trylock()". That, in turn, is because 
in case it is locked we want to do the exception table lookup (so that we 
get a nice oops rather than a deadlock if we happen to get a page fault 
while holding the mmap lock for writing).

So why "trylock" is normally not a very common operation, for rwsems it 
ends up being the _normal_ way to get the lock.

Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
---

This is on top of Peter's cleanup of my asm-cleanup patch.

On Hiroyuki-san's load, this trivial change improved his (admittedly 
_very_ artificial) page-fault benchmark by about 2%. The profile hit of 
down_read_trylock() went from 9.08% down to 7.73%. So the trylock itself 
seems to have improved by 15%+ from this.

All numbers above are meaningless, but the point is that the effect of 
this cacheline access pattern can be real.

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 4136200..e9480be 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -123,7 +123,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 {
 	__s32 result, tmp;
 	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  mov          %0,%1\n\t"
 		     "1:\n\t"
 		     "  mov          %1,%2\n\t"
 		     "  add          %3,%2\n\t"
@@ -133,7 +132,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 		     "2:\n\t"
 		     "# ending __down_read_trylock\n\t"
 		     : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
-		     : "i" (RWSEM_ACTIVE_READ_BIAS)
+		     : "i" (RWSEM_ACTIVE_READ_BIAS), "1" (RWSEM_UNLOCKED_VALUE)
 		     : "memory", "cc");
 	return result >= 0 ? 1 : 0;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ