linux-kernel - Re: Documentation for plain accesses and data races

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190916115254.GB29216@tardis>
Date:   Mon, 16 Sep 2019 19:52:54 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     LKMM Maintainers -- Akira Yokosawa <akiyks@...il.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Daniel Lustig <dlustig@...dia.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Nicholas Piggin <npiggin@...il.com>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will@...nel.org>,
        Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Documentation for plain accesses and data races

On Mon, Sep 16, 2019 at 01:17:53PM +0800, Boqun Feng wrote:
> Hi Alan,
> 
> I spend some time reading this, really helpful! Thanks.
> 
> Please see comments below:
> 
> On Fri, Sep 06, 2019 at 02:11:29PM -0400, Alan Stern wrote:
> [...]
> > If two memory accesses aren't concurrent then one must execute before
> > the other.  Therefore the LKMM decides two accesses aren't concurrent
> > if they can be connected by a sequence of hb, pb, and rb links
> > (together referred to as xb, for "executes before").  However, there
> > are two complicating factors.
> > 
> > If X is a load and X executes before a store Y, then indeed there is
> > no danger of X and Y being concurrent.  After all, Y can't have any
> > effect on the value obtained by X until the memory subsystem has
> > propagated Y from its own CPU to X's CPU, which won't happen until
> > some time after Y executes and thus after X executes.  But if X is a
> > store, then even if X executes before Y it is still possible that X
> > will propagate to Y's CPU just as Y is executing.  In such a case X
> > could very well interfere somehow with Y, and we would have to
> > consider X and Y to be concurrent.
> > 
> > Therefore when X is a store, for X and Y to be non-concurrent the LKMM
> > requires not only that X must execute before Y but also that X must
> > propagate to Y's CPU before Y executes.  (Or vice versa, of course, if
> > Y executes before X -- then Y must propagate to X's CPU before X
> > executes if Y is a store.)  This is expressed by the visibility
> > relation (vis), where X ->vis Y is defined to hold if there is an
> > intermediate event Z such that:
> > 
> > 	X is connected to Z by a possibly empty sequence of
> > 	cumul-fence links followed by an optional rfe link (if none of
> > 	these links are present, X and Z are the same event),
> > 
> 
> I wonder whehter we could add an optional ->coe or ->fre between X and
> the possibly empty sequence of cumul-fence, smiliar to the definition
> of ->prop. This makes sense because if we have
> 
> 	X ->coe X' (and X' in on CPU C)
> 
> , X must be already propagated to C before X' executed, according to our
> operation model:
> 
> "... In particular, it must arrange for the store to be co-later than
> (i.e., to overwrite) any other store to the same location which has
> already propagated to CPU C."
> 
> In other words, we can define ->vis as:
> 
> 	let vis = prop ; ((strong-fence ; [Marked] ; xbstar) | (xbstar & int))
> 

Hmm.. so the problem with this approach is that the (xbstar & int) part
doesn't satisfy the requirement of visibility... i.e.

	X ->prop Z ->(xbstar & int) Y

may not guarantee when Y executes, X is already propagated to Y's CPU.

Lemme think more on this...

Regards,
Boqun

> , and for this document, reference the "prop" section in
> explanation.txt. This could make the model simple (both for description
> and explanation), and one better thing is that we won't need commit in
> Paul's dev branch:
> 	
> 	c683f2c807d2 "tools/memory-model: Fix data race detection for unordered store and load"
> 
> , and data race rules will look more symmetrical ;-)
> 
> Thoughts? Or am I missing something subtle here?
> 
> Regards,
> Boqun
> 	
> > and either:
> > 
> > 	Z is connected to Y by a strong-fence link followed by a
> > 	possibly empty sequence of xb links,
> > 
> > or:
> > 
> > 	Z is on the same CPU as Y and is connected to Y by a possibly
> > 	empty sequence of xb links (again, if the sequence is empty it
> > 	means Z and Y are the same event).
> > 
> > The motivations behind this definition are straightforward:
> > 
> > 	cumul-fence memory barriers force stores that are po-before
> > 	the barrier to propagate to other CPUs before stores that are
> > 	po-after the barrier.
> > 
> > 	An rfe link from an event W to an event R says that R reads
> > 	from W, which certainly means that W must have propagated to
> > 	R's CPU before R executed.
> > 
> > 	strong-fence memory barriers force stores that are po-before
> > 	the barrier, or that propagate to the barrier's CPU before the
> > 	barrier executes, to propagate to all CPUs before any events
> > 	po-after the barrier can execute.
> > 
> > To see how this works out in practice, consider our old friend, the MP
> > pattern (with fences and statement labels, but without the conditional
> > test):
> > 
> > 	int buf = 0, flag = 0;
> > 
> > 	P0()
> > 	{
> > 		X: WRITE_ONCE(buf, 1);
> > 		   smp_wmb();
> > 		W: WRITE_ONCE(flag, 1);
> > 	}
> > 
> > 	P1()
> > 	{
> > 		int r1;
> > 		int r2 = 0;
> > 
> > 		Z: r1 = READ_ONCE(flag);
> > 		   smp_rmb();
> > 		Y: r2 = READ_ONCE(buf);
> > 	}
> > 
> > The smp_wmb() memory barrier gives a cumul-fence link from X to W, and
> > assuming r1 = 1 at the end, there is an rfe link from W to Z.  This
> > means that the store to buf must propagate from P0 to P1 before Z
> > executes.  Next, Z and Y are on the same CPU and the smp_rmb() fence
> > provides an xb link from Z to Y (i.e., it forces Z to execute before
> > Y).  Therefore we have X ->vis Y: X must propagate to Y's CPU before Y
> > executes.
> > 
> [...]



Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)