[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0612081101280.3516@woody.osdl.org>
Date: Fri, 8 Dec 2006 11:15:58 -0800 (PST)
From: Linus Torvalds <torvalds@...l.org>
To: Christoph Lameter <clameter@....com>
cc: Russell King <rmk+lkml@....linux.org.uk>,
David Howells <dhowells@...hat.com>,
Nick Piggin <nickpiggin@...oo.com.au>, akpm@...l.org,
linux-arm-kernel@...ts.arm.linux.org.uk,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org
Subject: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch
doesn't support it
On Fri, 8 Dec 2006, Christoph Lameter wrote:
>
> As also shown in this thread: There are restrictions on what you can do
> between ll/sc
This, btw, is almost certainly true on ARM too.
There are three major reasons for restrictions on ll/sc:
- bus-cycle induced things (eg variations of "you cannot do a store in
between the ll and the sc, because it will touch the cache and clear
the bit", where "the store" might be a load too, and "the cache" might
be just "the bus interface")
- trap handling usually clears the internal lock bit too, which means
that depending on the micro-architecture, even internal microtraps
(like even just branch misprediction, but more commonly things like TLB
misses etc) can cause a sc to always fail.
- timing. Livelock in particular.
The last one is the one that hits everybody, regardless of
microarchitecture. The rule may be that the LL/SC need to be within a
certain number of cycles (which can be very small - like ten) in order to
guarantee that the cacheline can't be stolen.
All of which means that _nobody_ can really do this reliably in C. Even if
there are no other microarchitectural rules (and it sounds like that might
be true on ARM), the timing issue means that you can _still_ only use it
for very specific and simple sequences, and trying to expose it as a
higher-level thing is not going to work in general for anything even
remotely complicated.
(The timing may also mean that you end up having to do random back-off
etc, just to make sure _somebody_ makes progress. Ie it might not be a
matter of "within ten cycles", but "you need to randomize the timing").
In other words, it's simply not an option to expose LL/SC as an interface.
It would be VERY convenient to do, since cmpxchg can emulate ll/sc (the
"ll" part is a normal load, the "sc" part is a "compare that the old value
still matches, and store the new one if so"). But because you can't expose
LL/SC anyway in any reasonably portable way, that just doesn't work.
So, you really do end up with three possibilities:
- do things with TRULY PORTABLE interfaces. And like it or not, cmpxchg
is the closest thing you can get to that. It's trivial to do cmpxchg
using ll/sc (modulo the "random backoff part" if you need it, which is
still pretty simple, but no longer totally trivial), and architectures
that have neither ll/sc _nor_ a native cmpxchg can just go screw
themselves with spinlocks - they really aren't worth worrying about in
SMP. At some point you have to tell hardware designers that their
hardware just sucks.
- have ugly conditional code in generic code. I personally think this is
a _much_ worse option in most cases.
- have a much higher-level interface and make it _all_ architecture-
dependent (possibly with a "generic" version for sane architectures).
This works, but the more high-level it is, the more you end up having
the same thign written in many different ways, and nasty maintenance.
So we generally set the bar pretty low. Things like semaphore locking
primitives are high-level enough already that we prefer to try to make
them use common lower-level interfaces (spinlocks, cmpxchg etc).
Something like kernel/workqueue.c is _way_ too high a level to do
arch-specific.
So right now, I think the "cmpxchg" or the "bitmask set" approach are the
alternatives. Russell - LL/SC simply isn't on the table as an interface,
whether you like it or not.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists