linux-kernel - Re: Unlock-lock questions and the Linux Kernel Memory Model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.44L0.1712011214420.1361-100000@iolanthe.rowland.org>
Date:   Fri, 1 Dec 2017 12:18:37 -0500 (EST)
From:   Alan Stern <stern@...land.harvard.edu>
To:     Daniel Lustig <dlustig@...dia.com>
cc:     Boqun Feng <boqun.feng@...il.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Jade Alglave <j.alglave@....ac.uk>,
        Nicholas Piggin <npiggin@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will.deacon@....com>,
        David Howells <dhowells@...hat.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Unlock-lock questions and the Linux Kernel Memory Model

On Fri, 1 Dec 2017, Daniel Lustig wrote:

> On 12/1/2017 7:32 AM, Alan Stern wrote:
> > On Fri, 1 Dec 2017, Boqun Feng wrote:
> >>> But even on a non-other-multicopy-atomic system, there has to be some 
> >>> synchronization between the memory controller and P1's CPU.  Otherwise, 
> >>> how could the system guarantee that P1's smp_load_acquire would see the 
> >>> post-increment value of y?  It seems reasonable to assume that this 
> >>> synchronization would also cause P1 to see x=1.
> >>>
> >>
> >> I agree with you the "reasonable" part ;-) So basically, memory
> >> controller could only do the write of AMO until P0's second write
> >> propagated to the memory controller(and because of the wmb(), P0's first
> >> write must be already propagated to the memory controller, too), so it
> >> makes sense when the write of AMO propagated from memory controller to
> >> P1, P0's first write is also propagted to P1. IOW, the write of AMO on
> >> memory controller acts at least like a release.
> >>
> >> However, some part of myself is still a little paranoid, because to my
> >> understanding, the point of AMO is to get atomic operations executing
> >> as fast as possible, so maybe, AMO has some fast path for the memory
> >> controller to forward a write to the CPU that issues the AMO, in that
> >> way, it will become unreasonable ;-)
> > 
> > It's true that a hardware design in the future might behave differently 
> > from current hardware.  If that ever happens, we will need to rethink 
> > the situation.  Maybe the designers will change their hardware to make 
> > it match the memory model.  Or maybe the memory model will change.
> 
> Do you mean all of the above in the context of increment etc, as opposed
> to swap?  ARM hardware in the wild is already documented as forwarding
> SWP values to subsequent loads early, even past control dependencies.
> Paul sent this link earlier in the thread.
> 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0735r0.html
> 
> The reason swap is special is because its store value is available to be
> forwarded even before the AMO goes out to the memory controller or
> wherever else it gets its load value from.

I believe the current intention for herd is as follows:

	xchg() and similar RMW operations do not generate an internal
	dependency;

	cmpxchg() and similar RMW operations generate an internal 
	control dependency;

	atomic_add() and similar RMW operations generate an internal 
	data dependency.

If herd adds support for saturating operations, they will generate at 
least a data dependency and maybe also a control dependency.

Alan

> Also, the case I described is an acquire rather than a control
> dependency, but it's similar enough that it doesn't seem completely
> unrealistic to think hardware might try to do this.
> 
> Dan