[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201009031635.25093.ptesarik@suse.cz>
Date: Fri, 3 Sep 2010 16:35:23 +0200
From: Petr Tesarik <ptesarik@...e.cz>
To: Tony Luck <tony.luck@...el.com>
Cc: "linux-ia64@...r.kernel.org" <linux-ia64@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Serious problem with ticket spinlocks on ia64
On Friday 03 of September 2010 11:04:37 Petr Tesarik wrote:
> [...]
> I'm now trying to modify the lock primitives:
>
> 1. replace the fetchadd4.acq with looping over cmpxchg
I did this and I feel dumber than ever. Basically, I replaced this snippet:
ticket = ia64_fetchadd(1, p, acq);
with:
int tmp;
do {
ticket = ACCESS_ONCE(lock->lock);
asm volatile (
"mov ar.ccv=%1\n"
"add %0=1,%1;;\n"
"cmpxchg4.acq %0=[%2],%0,ar.ccv\n"
: "=r" (tmp)
: "r" (ticket), "r" (&lock->lock)
: "ar.ccv");
} while (tmp != ticket);
Just to make sure I didn't miss something, this compiled to:
0xa0000001008dacb0: [MMI] nop.m 0x0
0xa0000001008dacb1: ld4.acq r15=[r32]
0xa0000001008dacb2: nop.i 0x0;;
0xa0000001008dacc0: [MII] mov.m ar.ccv=r15
0xa0000001008dacc1: adds r14=1,r15;;
0xa0000001008dacc2: nop.i 0x0
0xa0000001008dacd0: [MII] cmpxchg4.acq r14=[r32],r14,ar.ccv
0xa0000001008dacd1: nop.i 0x0
0xa0000001008dacd2: nop.i 0x0;;
0xa0000001008dace0: [MIB] nop.m 0x0
0xa0000001008dace1: cmp4.eq p7,p6=r14,r15
0xa0000001008dace2: (p06) br.cond.dptk.few 0xa0000001008dacb0
My test module recorded the following sequence on the failing CPU:
}, {
ip = 0xa00000010012f7b0,
addr = 0xe000000181925c08,
oldvalue = 0xffff0000,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x0,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x1,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x1,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x0,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x1,
newvalue = 0x1,
task = 0xe000000186930000
}, {
I didn't see values around zero on any other CPU in the system. So, either
there is something seriously broken in hardware, or I made a silly mistake in
the monitoring code.
I'm attaching my SystemTap script. I know it's hacky, but it worked for me.
Oh, I had to make two modification to the running kernel:
1. in ia64_fault()
By default the value of cr.ifa is not passed to the die notifiers, so I
(mis)used the ar_ssd field to store the ifa before calling notify_die() for
the debug faults.
2. in ivt.S
On all interrupt entries I added code similar to this (just using different
registers if appropriate):
movl r3 = (1 << 24)
mov r15 = psr
;;
or r3 = r3,r15
;;
mov psr.l = r3
;;
srlz.d
;;
Am I blind and did I do something obviously wrong?
Petr Tesarik
View attachment "watchlock.stp" of type "text/x-csrc" (7511 bytes)
Powered by blists - more mailing lists