linux-kernel - Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 07 Dec 2006 20:31:08 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Russell King <rmk+lkml@....linux.org.uk>
CC:	Christoph Lameter <clameter@....com>,
	David Howells <dhowells@...hat.com>, torvalds@...l.org,
	akpm@...l.org, linux-arm-kernel@...ts.arm.linux.org.uk,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org
Subject: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch
 doesn't support it

Russell King wrote:
> On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote:

> No.  If you read what I said, you'll see that you can _cheaply_ use
> cmpxchg in a ll/sc based implementation.  Take an atomic increment
> operation.
> 
> 	do {
> 		old = load_locked(addr);
> 	} while (store_exclusive(old, old + 1, addr);
> 
> On a cmpxchg, that "store_exclusive" (loosely) becomes your cmpxchg
> instruction, comparing the first arg, and if equal storing the second.
> The "load_locked" macro becomes a standard pointer deref.  Ergo, x86
> becomes:
> 
> 	do {
> 		load value
> 		manipulate it
> 		conditional store
> 	} while not stored
> 
> On ll/sc, the load_locked() macro is the load locked instruction.  The
> store_exclusive() macro is the exclusive store and it doesn't need to
> use the first parameter at all.  Ergo, ARM becomes:
> 
> 	do {
> 		ldrex r1, [r2]
> 		manipulate r1
> 		strex r0, r1, [r2]
> 	} while failed
> 
> Notice that both are optimal.
> 
> Now let's consider the cmpxchg case.
> 
> 	do {
> 		val = *addr;
> 	} while (cmpxchg(val, val + 1, addr);
> 
> The x86 case is _identical_ to the ll/sc based implementation.  Absolutely
> entirely.  No impact what so ever.
> 
> Let's look at the ll/sc case.  The cmpxchg code implemented on this has
> to reload the original value, compare it, if equal store the new value.
> So:
> 
> 	do {
> 		val = *addr;
> 		(r2 = addr, 
> 		ldrex r1, [r2]
> 		compare r1, r0
> 		strexeq r4, r3, [r2] (store exclusive if equal)
> 	} while store failed or comparecondition failed
> 
> Note how the cmpxchg has _forced_ the ll/sc implementation to become
> more complex.
> 
> So, let's recap.
> 
> Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg
> architectures to produce optimal code.
> 
> Implementing an cmpxchg based accessor macro allows cmpxchg architectures
> to produce optimal code and ll/sc non-optimal code.
> 
> See my point?

Wrong. Your ll/sc implementation with cmpxchg is buggy. The cmpxchg
load_locked is not locked at all, and there can be interleaving writes
between the load and cmpxchg which do not cause the store_conditional
to fail.

It might be reasonable to implement this watered down version, but:
don't some architectures have restrictions on what instructions can
be issued between the ll and the sc?

But in general I agree with you, in that a higher level primitive is
preferable (eg. atomic_add_unless).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/