linux-kernel - Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 11 Dec 2006 15:30:22 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	linux@...izon.com
CC:	torvalds@...l.org, linux-arch@...r.kernel.org,
	linux-arm-kernel@...ts.arm.linux.org.uk,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an

Hi "Linux",

linux@...izon.com wrote:
>>Even if ARM is able to handle any arbitrary C code between the
>>"load locked" and store conditional API, other architectures can not
>>by definition.
> 
> 
> Maybe so, but I think you and Linus are missing the middle ground.

Nobdy argued against adding nice arch specific helpers to do higher
level operations (for example, atomic_add_unless I added was able to
reduce the use of cmpxchg in the kernel and can be optimally
implemented with ll/sc). I implemented it specifically because I
didn't want to use atomic_cmpxchg directly for lockless pagecache,
exactly because it is suboptimal on RISCs in that performance
critical path.

The point is that if somebody wants to implement some fancy lockless
code, atomic_cmpxchg is a good tool to use that does not require
writing the assembly for two dozen architectures. If it is performance
critical then it can absolutely be rewritten in an optimal manner.

> While I agree that LL/SC can't be part of the kernel API for people to
> get arbitrarily clever with in the device driver du jour, they are *very*
> nice abstractions for shrinking the arch-specific code size.
> 
> The semantics are widely enough shared that it's quite possible in
> practice to write a good set of atomic primitives in terms of LL/SC
> and then let most architectures define LL/SC and simply #include the
> generic atomic op implementations.
> 
> If there's a restriction that would pessimize the generic implementation,
> that function can be implemented specially for that arch.
> 
> Then implementing things like backoff on contention can involve writing
> a whole lot less duplicated code.
> 
> 
> Just like you can write a set of helpers for, say, CPUs with physically
> addressed caches, even though the "real" API has to be able to handle the
> virtually addressed ones, you can write a nice set of helpers for machines
> with sane LL/SC.

So, what would your ll/sc abstraction look like? Let's hear it.

The one I'm thinking of goes something like this:

   atomic_ll() / atomic_sc() with the restriction that they cannot be
   nested, you cannot write any C code between them, and may only call
   into some specific set of atomic_llsc_xxx primitives, operating on
   the address given to ll, and must not have more than a given number
   of instructions between them. Also, the atomic_sc won't always fail
   if there were interleaving stores.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/