[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140207103139.GP5002@laptop.programming.kicks-ass.net>
Date: Fri, 7 Feb 2014 11:31:39 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Torsten Duwe <duwe@....de>
Cc: Scott Wood <scottwood@...escale.com>, linux-kernel@...r.kernel.org,
Paul Mackerras <paulus@...ba.org>,
Anton Blanchard <anton@...ba.org>,
Tom Musta <tommusta@...il.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
linuxppc-dev@...ts.ozlabs.org, Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH] Convert powerpc simple spinlocks into ticket locks
On Fri, Feb 07, 2014 at 10:02:48AM +0100, Torsten Duwe wrote:
> On Thu, Feb 06, 2014 at 02:19:52PM -0600, Scott Wood wrote:
> > On Thu, 2014-02-06 at 18:37 +0100, Torsten Duwe wrote:
> > > On Thu, Feb 06, 2014 at 05:38:37PM +0100, Peter Zijlstra wrote:
> >
> > > > Can you pair lwarx with sthcx ? I couldn't immediately find the answer
> > > > in the PowerISA doc. If so I think you can do better by being able to
> > > > atomically load both tickets but only storing the head without affecting
> > > > the tail.
>
> Can I simply write the half word, without a reservation, or will the HW caches
> mess up the other half? Will it ruin the cache coherency on some (sub)architectures?
So if you have ll/sc on the whole word concurrent with the half-word
store, you can loose the half-word store like:
lwarx &tickets
... sth &tail
stwcd &tickets
The stwcd will over-write the tail store.
Anyway, what might work is something like (please forgive my ppc asm, I
can barely read the thing, I've never before attempted writing it):
lock:
1: lharx %0, 0, &head
mov %1, %0
addic %0, %0, 1
stwcd %0, 0, &head
bne- 1b
2: lhax %0, 0, &tail
lwsync
cmp 0, %0, %0
bne- 2b
unlock:
lhz %0, 0, &tail
addic %0, %0, 1
lwsync
sth %0, 0, &tail
Which would somewhat translate into C as:
static inline void ticket_spin_lock(tickets_t *lock)
{
ticket_t mine = xadd(&lock->head);
while (smp_load_acquire(&lock->tail) != mine)
cpu_relax();
}
static inline void ticket_spin_unlock(tickets_t *lock)
{
ticket_t tail = lock->tail + 1;
smp_store_release(&lock->tail, tail);
}
Where xadd() returns the value before addition and we assume half word
single-copy atomicy, such that the head and tail updates will not
interfere.
The x86 implementation uses the 32bit xadd and places the head at the
MSB end to get the atomic add + tail load in a single instruction, but
for PPC its much better to have an extra load (to an already hot
cacheline) and avoid a second ll/sc pair, as the ll/sc things are stupid
slow for your arch afaik.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists