lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 16 Nov 2015 13:58:49 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Will Deacon <will.deacon@....com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Jonathan Corbet <corbet@....net>,
	Michal Hocko <mhocko@...nel.org>,
	David Howells <dhowells@...hat.com>,
	Michael Ellerman <mpe@...erman.id.au>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>
Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()

On Mon, Nov 16, 2015 at 8:24 AM, Will Deacon <will.deacon@....com> wrote:
>
> ... or we upgrade spin_unlock_wait to a LOCK operation, which might be
> slightly cheaper than spin_lock()+spin_unlock().

So traditionally the real concern has been the cacheline ping-pong
part of spin_unlock_wait(). I think adding a memory barrier (that
doesn't force any exclusive states, just ordering) to it is fine, but
I don't think we want to necessarily have it have to get the cacheline
into exclusive state.

Because if spin_unlock_wait() ends up having to get the spinlock
cacheline (for example, by writing the same value back with a SC), I
don't think spin_unlock_wait() will really be all that much cheaper
than just getting the spinlock, and in that case we shouldn't play
complicated ordering games.

On another issue:

I'm also looking at the ARM documentation for strx, and the
_documentation_ says that it has no stronger ordering than a "store
release", but I'm starting to wonder if that is actually true.

Because I do end up thinking that it does have the same "control
dependency" to all subsequent writes (but not reads). So reads after
the SC can percolate up, but I think writes are restricted.

Why? In order for the SC to be able to return success, the write
itself may not have been actually done yet, but the cacheline for the
write must have successfully be turned into exclusive ownership.
Agreed?

That means that by the time a SC returns success, no other CPU can see
the old value of the spinlock any more. So by the time any subsequent
stores in the locked region can be visible to any other CPU's, the
locked value of the lock itself has to be visible too.

Agreed?

So I think that in effect, when a spinlock is implemnted with LL/SC,
the loads inside the locked region are only ordered wrt the acquire on
the LL, but the stores can be considered ordered wrt the SC.

No?

So I think a _successful_ SC - is still more ordered than just any
random store with release consistency.

Of course, I'm not sure that actually *helps* us, because I think the
problem tends to be loads in the locked region moving up earlier than
the actual store that sets the lock, but maybe it makes some
difference.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ