[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BLU0-SMTP35E10E00A34C3FF2DB55B297200@phx.gbl>
Date: Mon, 2 Jun 2014 11:39:55 -0400
From: John David Anglin <dave.anglin@...l.net>
To: Mikulas Patocka <mpatocka@...hat.com>
CC: Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
jejb@...isc-linux.org, deller@....de, linux-parisc@...r.kernel.org,
linux-kernel@...r.kernel.org, chegu_vinod@...com,
paulmck@...ux.vnet.ibm.com, Waiman.Long@...com, tglx@...utronix.de,
riel@...hat.com, akpm@...ux-foundation.org, davidlohr@...com,
hpa@...or.com, andi@...stfloor.org, aswin@...com,
scott.norton@...com, Jason Low <jason.low2@...com>
Subject: Re: [PATCH] fix a race condition in cancelable mcs spinlocks
On 6/2/2014 10:02 AM, Mikulas Patocka wrote:
>
> On Mon, 2 Jun 2014, Mikulas Patocka wrote:
>
>>
>> On Sun, 1 Jun 2014, John David Anglin wrote:
>>
>>> On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote:
>>>
>>>>> If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at
>>>>> the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock,
>>>>> so, in this case, cmpxchg or xchg isn't really atomic at all.
>>>> And this is really the first place in the kernel that breaks like this?
>>>> I've been using xchg() and cmpxchg() without such consideration for
>>>> quite a while.
>>> I believe Mikulas is correct. Even in a controlled situation where a
>>> cmpxchg operation is used to implement pthread_spin_lock() in userspace,
>>> we found recently that the lock must be released with a cmpxchg
>>> operation and not a simple write on SMP systems. There is a race in the
>>> cache operations or instruction ordering that's not present with the
>>> ldcw instruction.
>>>
>>> Dave
>>> --
>>> John David Anglin dave.anglin@...l.net
>> That is strange.
>>
>> Spinlock with cmpxchg on lock and a single write on unlock should work,
>> assuming that cmpxchg doesn't write to the target address when it detects
>> mismatch (the cmpxchg in the kernel syscall page doesn't do it, it
>> nullifies the write instruction on mismatch).
>>
>> Do you have some code that reproduces this misbehavior?
There is a pthread_spin_lock test in the kyotocabinet package that
reproduces
this misbehavior. Essentially, it creates four threads which loop doing
pthread_spin_lock(),
sched_yield() and then pthread_spin_unlock(). On SMP systems, the test
hangs with
the pthread_spin_lock locked and no thread holding lock (i.e., unlock
failed).
The pthread support uses the cmpxchg code in
arch/parisc/kernel/syscall.S. This uses
"hashed" locks, etc, in a manner similar to the kernel code.
>>
>> We really need to find out why does it behave this way:
>> - is PA-RISC really out of order? (we used to believe that it is in-order
>> and we have empty barrier instructions in the kernel). Does adding the
>> "SYNC" instruction before the write in pthread_spin_unlock fix it?
I tried "SYNC" instruction before write and after the cmpxchg operation both
with. In the cmpxchg operation, I also tried it with cache flush. I was
trying to
simulated ldcw behavior.
>> - does the processor performs nullified writes unconditionally? Does
>> moving the write in the cmpxchg implementation from the nullified slot
>> to is own branch fix it?
I don't see how the processor can perform nullified writes
unconditionally although that
might explain the observed symptom. Didn't try moving the cmpxchg write.
>> - does adding a dummy "ldcw" instruction to an unrelated address fix it?
>> Is it that "ldcw" has some magic barrier properties?
I had wondered about that. One can't use %r0 as the instruction target
as the architecture
manual says that it may then be implemented as a normal load. "ldcw"
definitely has some magic
cache and barrier properties. A normal store definitely works with it
to reset the semaphore.
> - and there is "stw,o" instruction that does ordered store according to
> the specification, so we should test it too...
This doesn't help.
Currently, the Debian eglibc has a pthread_spin_unlock.diff patch that
resolves the
kyotocabinet bug. See:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=725508
>
>> I think we need to perform these tests and maybe some more to find out
>> what really happened there...
>>
>> BTW. in Debian 5 libc 2.7, pthread_spin_lock uses ldcw and
>> pthread_spin_unlock uses a single write (just like the kernel spinlock
>> implementation). In Debian-ports libc 2.18, both pthread_spin_lock and
>> pthread_spin_unlock call the kernel syscall page. What was the reason for
>> switching to a less efficient implementation?
>>
>> Mikulas
>>
>
Dave
--
John David Anglin dave.anglin@...l.net
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists