lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.98.0706201018290.3593@woody.linux-foundation.org>
Date:	Wed, 20 Jun 2007 10:34:15 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jarek Poplawski <jarkao2@...pl>
cc:	Ingo Molnar <mingo@...e.hu>, Miklos Szeredi <miklos@...redi.hu>,
	cebbert@...hat.com, chris@...ee.ca, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, akpm@...ux-foundation.org
Subject: Re: [BUG] long freezes on thinkpad t60



On Wed, 20 Jun 2007, Jarek Poplawski wrote:
> 
> I don't agree with this (+ I know it doesn't matter).
> 
> The real bug is what Chuck Ebbert wrote: "Spinlocks aren't fair".
> And here they are simply lawlessly not fair.

Well, that's certainly a valid standpoint. I wouldn't claim you're 
_wrong_.

At least in theory.

In *practice*, fair spinlocks are simply not possible. Not as in "it's 
hard to do", but as in "you simply cannot do it".

The thing is, most spinlocks get held for a rather short time (if that 
wasn't the case, you'd use something like a mutex), and it's acually a 
short time not on a "human scale", but on a "CPU core scale".

IOW, the cost of doing the cacheline bouncing is often equal to, or bigger 
than, the actual cost of the operation that happens inside the spinlock!

What does this result in? It automatically means that software simply 
*cannot* do fair spinlocks, because all the timing costs are in parts that 
software doesn't even have any visibility into, or control over!

Yeah, you could add artificial delays, by doing things like cycle counting 
around the spinlock operation to see if you got the lock when it was 
_contended_, or whether you got a lock that wasn't, and then adding some 
statistics etc, but at that point, you don't have a spinlock any more, you 
have something else. I don't know what to call it.

And you could add flags like "this spinlock is getting a lot of 
contention", try some other algorithm etc. But it's all complicated and 
against the whole *point* of a spinlock, which is to get in and out as 
fast as humanly possible.

So in practice, spinlock fairness is inherently tied to the hardware 
behaviour of the cache coherency algorithm.

Which gets us to the next level: we can consider hardware that isn't 
"fair" in its cache coherency to be buggy hardware.

That's also a perfectly fine standpoint, and it's actually one that I have 
a lot of sympathy for. I think it's much easier to some degree to do 
fairness in the cache coherency than at any other level, because it's 
something where the hardware really *could* do things like counting 
bouncing etc.

However, while it's a perfectly fine standpoint, it's also totally 
unrealistic. First off, the hardware doesn't even know whether the 
spinlock "failed" or not on any architecture platform I'm aware of. On 
x86, the spinlock operation under Linux is actually just an atomic 
decrement, and it so happens that the rules for failure was that it didn't 
go negative. But that's just a Linux internal rule - the hardware doesn't 
know.

So you'd have to actualyl have some specific sequence with specific 
fairness rules (maybe you could make the rule be that a failed atomic 
"cmpxchg" counts as a real failure, and if you give higher priority to 
cores with lots of failures etc).

IOW, it's certainly possible *in*theory* to try to make hardware that has 
fairness guarantees. However, any hardware designer will tell you that 
 (a) they have enough problems as it is
 (b) it's simply not worth their time, since there are other (simpler) 
     things that are much much more important.

So the end result:
 - hardware won't do it, and they'd be crazy to really even try (apart 
   from maybe some really simplistic stuff that doesn't _guarantee_ 
   anything, but maybe helps fairness a bit)
 - software cannot do it, without turning spinlocks into something really 
   slow and complicated, at which point they've lost all meaning, and you 
   should just use a higher-level construct like a mutex instead.

In other words, spinlocks are optimized for *lack* of contention. If a 
spinlock has contention, you don't try to make the spinlock "fair". No, 
you try to fix the contention instead!

That way, you fix many things. You probably speed things up, _and_ you 
make it fairer. It might sometimes take some brains and effort, but it's 
worth it.

The patch I sent out was an example of that. You *can* fix contention 
problems. Does it take clever approaches? Yes. It's why we have hashed 
spinlocks, RCU, and code sequences that are entirely lockless and use 
optimistic approaches. And suddenly you get fairness *and* performance!

It's a win-win situation. It does require a bit of effort, but hey, we're 
good at effort.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ