lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 Sep 2006 05:25:37 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	David Howells <dhowells@...hat.com>
CC:	Arjan van de Ven <arjan@...radead.org>,
	Dong Feng <middle.fengdong@...il.com>, ak@...e.de,
	Paul Mackerras <paulus@...ba.org>,
	Christoph Lameter <clameter@....com>,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org
Subject: Re: Why Semaphore Hardware-Dependent?

David Howells wrote:
> Nick Piggin <nickpiggin@...oo.com.au> wrote:
> 
> 
>>+	while ((tmp = atomic_read(&sem->count)) >= 0) {
>>+		if (tmp == atomic_cmpxchg(&sem->count, tmp,
>>+				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
> 
> 
> NAK for FRV.  Do not use atomic_cmpxchg() there as it isn't strictly atomic
> (FRV only has one strictly atomic operation: SWAP).  Please leave FRV as using
> the spinlock version which is more efficient on UP.

 From what I can read of the asm and the documentation, it is atomic. If
it were not atomic then it is badly broken.

BTW. if atomic_cmpxchg is slower than a local_irq_disable+local_irq_enable
on your architecture then you have probably not implemented cmpxchg well
because you may as well just implement it with local_irq_disable.

> Please also show benchmarks to show the performance difference between your
> version and the old version before Ingo messed it up and made everything
> unconditionally out of line without cleaning the inline asm up.

I'm not so interested in counting cycles so much as consolidating duplicated
code and reduce complexity and icache footprint. I'll leave you to benchmark
Ingo's changes if you're concerned about them.

Of course I will benchmark the end results when I finish the patch, though.

> If you are going to generalise, you should get rid of everything barring the
> spinlock-based version and stick with that.  It will cost you performance
> under some circumstances, but it's better under others than attempting to use
> atomic_cmpxchg() which may not really exist on all archs.

atomic_cmpxchg exists on all architectures.

I'm happy to go with the spinlock version (it is even simpler), but I will
have to benchmark that. I have seen small slowdowns there in high contention
situations but it was improved with my rwsem scalability patch. If
performance is no different between the two, then the spinlock version is a
no brainer.

> You've also caused another problem: the spinlock based version permits up to
> 2^31 - 1 readers at one time, the inline optimised version, on a 32-bit arch,
> will only permit up to 2^16 - 1 at most.  By doing this to x86_64, you've
> reduced the number of processes it can support.

Yep, Christoph pointed this out. I'll fix it.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ