[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200807081207.34453.nickpiggin@yahoo.com.au>
Date: Tue, 8 Jul 2008 12:07:34 +1000
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Jeremy Fitzhardinge <jeremy@...p.org>
Cc: Rik van Riel <riel@...hat.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Christoph Lameter <clameter@....com>,
Petr Tesarik <ptesarik@...e.cz>, Ingo Molnar <mingo@...e.hu>,
linux-kernel@...r.kernel.org
Subject: Re: Spinlocks: Factor our GENERIC_LOCKBREAK in order to avoid spin with irqs disable
On Tuesday 08 July 2008 06:14, Jeremy Fitzhardinge wrote:
> The other point, of course, is that ticket locks are massive overkill
> for the problem they're trying to solve.
No they aren't.
> It's one thing to introduce an
> element of fairness into spinlocks, its another to impose strict FIFO
> ordering. It would be enough to make the locks "polite" by preventing a
> new lock-holder from taking the lock while its under contention.
> Something like:
>
> union lock {
> unsigned short word;
> struct { unsigned char lock, count; };
> };
>
> spin_lock: # ebx - lock pointer
> movw $0x0001, %ax # add 1 to lock, 0 to count
> xaddw %ax, (%ebx) # attempt to take lock and test user count
> testw %ax,%ax
> jnz slow
>
> taken: ret
>
> # slow path
> slow: lock incb 1(%ebx) # inc count
>
> 1: rep;nop
> cmpb $0,(%ebx)
> jnz 1b # wait for unlocked
>
> movb $1,%al # attempt to take lock (count already increased)
> xchgb %al,(%ebx)
> testb %al,%al
> jnz 1b
>
> lock decb 1(%ebx) # drop count
> jmp taken
>
> spin_unlock:
> movb $0,(%ebx)
> ret
>
>
> The uncontended fastpath is similar to the pre-ticket locks, but it
> refuses to take the lock if there are other waiters, even if the lock is
> not currently held. This prevents the rapid lock-unlock cycle on one
> CPU from starving another CPU, which I understand was the original
> problem tickets locks were trying to solve.
They prevent lots of unfairness and starvation problems. The most
prominent one (ie. actually observed in Linux) was a single CPU
being totally starved by N others (to the point where lockup timers
would kick in).
As an aside, these locks you propose are also a lot more costly in
the contended path. 4 vs 1 atomic operations on the lock cacheline
is not so great.
> But it also means that all the contended spinners get the lock in
> whatever order the system decides to give it to them, rather than
> imposing a strict order.
The exact problem is that the system actively does the wrong thing
when you allow it to decide.
Unlike simple cacheline access, I don't believe it is such a good
idea to batch up locks many times on the same CPU for example. While
it surely could improve performance in specific situations, I think
that if code is taking and releasing a lock many times, then it most
probably should be either reworked to hold the lock for longer, or
changed completely. And once locks become *really* contended, then
the cost of moving the critical section to another CPU is really
drowned out by the cost of contention itself (all the idle time,
and taking/releasing the lock cacheline).
So far my theory has held up (except for virtualized systems).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists