linux-kernel - Re: [PATCH 4/4] locking: Introduce smp_cond

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 16 Nov 2015 13:58:49 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Will Deacon <will.deacon@....com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Jonathan Corbet <corbet@....net>,
	Michal Hocko <mhocko@...nel.org>,
	David Howells <dhowells@...hat.com>,
	Michael Ellerman <mpe@...erman.id.au>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>
Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()

On Mon, Nov 16, 2015 at 8:24 AM, Will Deacon <will.deacon@....com> wrote:
>
> ... or we upgrade spin_unlock_wait to a LOCK operation, which might be
> slightly cheaper than spin_lock()+spin_unlock().

So traditionally the real concern has been the cacheline ping-pong
part of spin_unlock_wait(). I think adding a memory barrier (that
doesn't force any exclusive states, just ordering) to it is fine, but
I don't think we want to necessarily have it have to get the cacheline
into exclusive state.

Because if spin_unlock_wait() ends up having to get the spinlock
cacheline (for example, by writing the same value back with a SC), I
don't think spin_unlock_wait() will really be all that much cheaper
than just getting the spinlock, and in that case we shouldn't play
complicated ordering games.

On another issue:

I'm also looking at the ARM documentation for strx, and the
_documentation_ says that it has no stronger ordering than a "store
release", but I'm starting to wonder if that is actually true.

Because I do end up thinking that it does have the same "control
dependency" to all subsequent writes (but not reads). So reads after
the SC can percolate up, but I think writes are restricted.

Why? In order for the SC to be able to return success, the write
itself may not have been actually done yet, but the cacheline for the
write must have successfully be turned into exclusive ownership.
Agreed?

That means that by the time a SC returns success, no other CPU can see
the old value of the spinlock any more. So by the time any subsequent
stores in the locked region can be visible to any other CPU's, the
locked value of the lock itself has to be visible too.

Agreed?

So I think that in effect, when a spinlock is implemnted with LL/SC,
the loads inside the locked region are only ordered wrt the acquire on
the LL, but the stores can be considered ordered wrt the SC.

No?

So I think a _successful_ SC - is still more ordered than just any
random store with release consistency.

Of course, I'm not sure that actually *helps* us, because I think the
problem tends to be loads in the locked region moving up earlier than
the actual store that sets the lock, but maybe it makes some
difference.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/