linux-kernel - Re: [PATCH] fix a race condition in cancelable mcs spinlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LRH.2.02.1406020914440.18342@file01.intranet.prod.int.rdu2.redhat.com>
Date:	Mon, 2 Jun 2014 09:58:29 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	John David Anglin <dave.anglin@...l.net>
cc:	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	jejb@...isc-linux.org, deller@....de, linux-parisc@...r.kernel.org,
	linux-kernel@...r.kernel.org, chegu_vinod@...com,
	paulmck@...ux.vnet.ibm.com, Waiman.Long@...com, tglx@...utronix.de,
	riel@...hat.com, akpm@...ux-foundation.org, davidlohr@...com,
	hpa@...or.com, andi@...stfloor.org, aswin@...com,
	scott.norton@...com, Jason Low <jason.low2@...com>
Subject: Re: [PATCH] fix a race condition in cancelable mcs spinlocks



On Sun, 1 Jun 2014, John David Anglin wrote:

> On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote:
> 
> > > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at
> > > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock,
> > > so, in this case, cmpxchg or xchg isn't really atomic at all.
> > 
> > And this is really the first place in the kernel that breaks like this?
> > I've been using xchg() and cmpxchg() without such consideration for
> > quite a while.
> 
> I believe Mikulas is correct.  Even in a controlled situation where a 
> cmpxchg operation is used to implement pthread_spin_lock() in userspace, 
> we found recently that the lock must be released with a cmpxchg 
> operation and not a simple write on SMP systems. There is a race in the 
> cache operations or instruction ordering that's not present with the 
> ldcw instruction.
> 
> Dave
> --
> John David Anglin	dave.anglin@...l.net

That is strange.

Spinlock with cmpxchg on lock and a single write on unlock should work,
assuming that cmpxchg doesn't write to the target address when it detects
mismatch (the cmpxchg in the kernel syscall page doesn't do it, it
nullifies the write instruction on mismatch).

Do you have some code that reproduces this misbehavior?

We really need to find out why does it behave this way:
- is PA-RISC really out of order? (we used to believe that it is in-order
  and we have empty barrier instructions in the kernel). Does adding the
  "SYNC" instruction before the write in pthread_spin_unlock fix it?
- does the processor performs nullified writes unconditionally? Does
  moving the write in the cmpxchg implementation from the nullified slot
  to is own branch fix it?
- does adding a dummy "ldcw" instruction to an unrelated address fix it?
  Is it that "ldcw" has some magic barrier properties?

I think we need to perform these tests and maybe some more to find out
what really happened there...

BTW. in Debian 5 libc 2.7, pthread_spin_lock uses ldcw and 
pthread_spin_unlock uses a single write (just like the kernel spinlock 
implementation). In Debian-ports libc 2.18, both pthread_spin_lock and 
pthread_spin_unlock call the kernel syscall page. What was the reason for 
switching to a less efficient implementation?

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/