linux-kernel - Re: [GIT PULL rcu/next] RCU commits for 4.13

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170629113641.GA18491@arm.com>
Date:   Thu, 29 Jun 2017 12:36:43 +0100
From:   Will Deacon <will.deacon@....com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Alan Stern <stern@...land.harvard.edu>,
        Andrea Parri <parri.andrea@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        priyalee.kushwaha@...el.com,
        Stanisław Drozd <drozdziak1@...il.com>,
        Arnd Bergmann <arnd@...db.de>, ldr709@...il.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Triplett <josh@...htriplett.org>,
        Nicolas Pitre <nico@...aro.org>,
        Krister Johansen <kjlx@...pleofstupid.com>,
        Vegard Nossum <vegard.nossum@...cle.com>, dcb314@...mail.com,
        Wu Fengguang <fengguang.wu@...el.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Rik van Riel <riel@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Jade Alglave <j.alglave@....ac.uk>
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

Hey Paul,

On Wed, Jun 28, 2017 at 05:45:56PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > <paulmck@...ux.vnet.ibm.com> wrote:
> > >
> > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > spin_lock + spin_unlock?  For example, is the current x86 implementation
> > > of spin_unlock_wait() really a non-negotiable hard requirement?  Or
> > > would you be willing to live with the spin_lock + spin_unlock semantics?
> > 
> > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> > 
> > One of the issues is that the same as "spin_lock + spin_unlock" is
> > basically now architecture-dependent. Is it really the
> > architecture-dependent ordering you want to define this as?
> > 
> > So I just think it's a *bad* definition. If somebody wants something
> > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > do *THAT*. It's completely pointless to me to define
> > spin_unlock_wait() in those terms.
> > 
> > And if it's not equivalent to the *architecture* behavior of
> > spin_lock+spin_unlock, then I think it should be descibed in terms
> > that aren't about the architecture implementation (so you shouldn't
> > describe it as "spin_lock+spin_unlock", you should describe it in
> > terms of memory barrier semantics.
> > 
> > And if we really have to use the spin_lock+spinunlock semantics for
> > this, then what is the advantage of spin_unlock_wait at all, if it
> > doesn't fundamentally avoid some locking overhead of just taking the
> > spinlock in the first place?
> > 
> > And if we can't use a cheaper model, maybe we should just get rid of
> > it entirely?
> > 
> > Finally: if the memory barrier semantics are exactly the same, and
> > it's purely about avoiding some nasty contention case, I think the
> > concept is broken - contention is almost never an actual issue, and if
> > it is, the problem is much deeper than spin_unlock_wait().
> 
> All good points!
> 
> I must confess that your sentence about getting rid of spin_unlock_wait()
> entirely does resonate with me, especially given the repeated bouts of
> "but what -exactly- is it -supposed- to do?" over the past 18 months
> or so.  ;-)
> 
> Just for completeness, here is a list of the definitions that have been
> put forward, just in case it inspires someone to come up with something
> better:
> 
> 1.	spin_unlock_wait() provides only acquire semantics.  Code
> 	placed after the spin_unlock_wait() will see the effects of
> 	all previous critical sections, but there is no guarantees for
> 	subsequent critical sections.  The x86 implementation provides
> 	this.  I -think- that the ARM and PowerPC implementations could
> 	get rid of a memory-barrier instruction and still provide this.
> 
> 2.	As #1 above, but a "smp_mb();spin_unlock_wait();" provides the
> 	additional guarantee that code placed before this construct is
> 	seen by all subsequent critical sections.  The x86 implementation
> 	provides this, as do ARM and PowerPC, but it is not clear that all
> 	architectures do.  As Alan noted, this is an extremely unnatural
> 	definition for the current memory model.
> 
> 3.	[ Just for completeness, yes, this is off the table! ]  The
> 	spin_unlock_wait() has the same semantics as a spin_lock()
> 	followed immediately by a spin_unlock().
> 
> 4.	spin_unlock_wait() is analogous to synchronize_rcu(), where
> 	spin_unlock_wait()'s "read-side critical sections" are the lock's
> 	normal critical sections.  This was the first definition I heard
> 	that made any sense to me, but it turns out to be equivalent
> 	to #3.	Thus, also off the table.
> 
> Does anyone know of any other possible definitions?

My understanding was that spin_unlock_wait() has:

  * Acquire semantics
  * Is ordered with respect to any prior spin_lock/spin_unlock operations
    on the same thread.

so if you want order against other PO-prior accesses, like in Andrea's test,
then you need an explicit smp_mb() (see, for example, "CASE 2" of the big
comment in qspinlock.c).

That's what I used when implementing this for arm64, and I think that's what
Peter's been going by too (at least, I think the current implementations
meet those requirements).

Do we have users in-tree that need more than that?

Will