linux-kernel - Re: [PATCH] fix a race condition in cancelable mcs spinlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BLU0-SMTP35E10E00A34C3FF2DB55B297200@phx.gbl>
Date:	Mon, 2 Jun 2014 11:39:55 -0400
From:	John David Anglin <dave.anglin@...l.net>
To:	Mikulas Patocka <mpatocka@...hat.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	jejb@...isc-linux.org, deller@....de, linux-parisc@...r.kernel.org,
	linux-kernel@...r.kernel.org, chegu_vinod@...com,
	paulmck@...ux.vnet.ibm.com, Waiman.Long@...com, tglx@...utronix.de,
	riel@...hat.com, akpm@...ux-foundation.org, davidlohr@...com,
	hpa@...or.com, andi@...stfloor.org, aswin@...com,
	scott.norton@...com, Jason Low <jason.low2@...com>
Subject: Re: [PATCH] fix a race condition in cancelable mcs spinlocks

On 6/2/2014 10:02 AM, Mikulas Patocka wrote:
>
> On Mon, 2 Jun 2014, Mikulas Patocka wrote:
>
>>
>> On Sun, 1 Jun 2014, John David Anglin wrote:
>>
>>> On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote:
>>>
>>>>> If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at
>>>>> the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock,
>>>>> so, in this case, cmpxchg or xchg isn't really atomic at all.
>>>> And this is really the first place in the kernel that breaks like this?
>>>> I've been using xchg() and cmpxchg() without such consideration for
>>>> quite a while.
>>> I believe Mikulas is correct.  Even in a controlled situation where a
>>> cmpxchg operation is used to implement pthread_spin_lock() in userspace,
>>> we found recently that the lock must be released with a cmpxchg
>>> operation and not a simple write on SMP systems. There is a race in the
>>> cache operations or instruction ordering that's not present with the
>>> ldcw instruction.
>>>
>>> Dave
>>> --
>>> John David Anglin	dave.anglin@...l.net
>> That is strange.
>>
>> Spinlock with cmpxchg on lock and a single write on unlock should work,
>> assuming that cmpxchg doesn't write to the target address when it detects
>> mismatch (the cmpxchg in the kernel syscall page doesn't do it, it
>> nullifies the write instruction on mismatch).
>>
>> Do you have some code that reproduces this misbehavior?
There is a pthread_spin_lock test in the  kyotocabinet package that 
reproduces
this misbehavior.  Essentially, it creates four threads which loop doing 
pthread_spin_lock(),
sched_yield() and then pthread_spin_unlock().  On SMP systems, the test 
hangs with
the pthread_spin_lock locked and no thread holding lock (i.e., unlock 
failed).

The pthread support uses the cmpxchg code in 
arch/parisc/kernel/syscall.S.  This uses
"hashed" locks, etc, in a manner similar to the kernel code.

>>
>> We really need to find out why does it behave this way:
>> - is PA-RISC really out of order? (we used to believe that it is in-order
>>    and we have empty barrier instructions in the kernel). Does adding the
>>    "SYNC" instruction before the write in pthread_spin_unlock fix it?
I tried "SYNC" instruction before write and after the cmpxchg operation both
with.  In the cmpxchg operation, I also tried it with cache flush. I was 
trying to
simulated ldcw behavior.
>> - does the processor performs nullified writes unconditionally? Does
>>    moving the write in the cmpxchg implementation from the nullified slot
>>    to is own branch fix it?
I don't see how the processor can perform nullified writes 
unconditionally although that
might explain the observed symptom.  Didn't try moving the cmpxchg write.

>> - does adding a dummy "ldcw" instruction to an unrelated address fix it?
>>    Is it that "ldcw" has some magic barrier properties?
I had wondered about that.  One can't use %r0 as the instruction target 
as the architecture
manual says that it may then be implemented as a normal load. "ldcw" 
definitely has some magic
cache and barrier properties.  A normal store definitely works with it 
to reset the semaphore.
> - and there is "stw,o" instruction that does ordered store according to
> the specification, so we should test it too...
This doesn't help.

Currently, the Debian eglibc has a pthread_spin_unlock.diff patch that 
resolves the
kyotocabinet bug.  See:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=725508

>
>> I think we need to perform these tests and maybe some more to find out
>> what really happened there...
>>
>> BTW. in Debian 5 libc 2.7, pthread_spin_lock uses ldcw and
>> pthread_spin_unlock uses a single write (just like the kernel spinlock
>> implementation). In Debian-ports libc 2.18, both pthread_spin_lock and
>> pthread_spin_unlock call the kernel syscall page. What was the reason for
>> switching to a less efficient implementation?
>>
>> Mikulas
>>
>

Dave

-- 
John David Anglin    dave.anglin@...l.net

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/